Skip to main content
Fig. 2 | Journal of Cheminformatics

Fig. 2

From: Comprehensive machine learning boosts structure-based virtual screening for PARP1 inhibitors

Fig. 2

The methodological workflow of this study. Experimental data on PARP1 were retrieved from ChEMBL and distributed to the training set and the test set. The training set contains 1565 molecules with biochemical (but not cellular) IC50s, while the test set comprises 93 molecules with cellular IC50s in addition to their in vitro biochemical potency. The threshold to distinguish true actives from true inactives in both data sets is 1 µM. 3350 decoys property-matched to 67 test actives were generated by DeepCoy, and form part of the test set. All ligands were docked into their receptor (PDB ID 7KK4) by Smina, after which either PLEC or GRID features were extracted from each docked complex. Ligand-only Morgan fingerprints (512 bits, radius 2) were also computed. These features were then used as input for ML model training and testing, using five supervised learning algorithms: RF, SVM, XGB, ANN, DNN (hyperparameters were tuned using Bayesian optimization). The VS performance of all algorithms was evaluated in terms of EF1% and NEF1% and visualized as precision-recall curves. Three off-the-shelf generic SFs (Smina, CNN-Score, SCORCH) were evaluated on the same test set as the ML SFs for comparison. A dissimilar test set was also created by keeping only test molecules whose Tanimoto similarity scores to any training instances (Morgan fingerprints, 2048 bits, radius 2) were lower than 0.70, on which all SFs were also evaluated

Back to article page