Skip to main content

Table 1 Training and validation set sizes for the different benchmarks

From: Randomized SMILES strings improve the quality of molecular generative models

Model

Training set size

Validation set size

GDB-13 1M

1,000,000

10,000

GDB-13 10K

10,000

1000

GDB-13 1K

1000

1000

ChEMBL

1,483,943

78,102

  1. Notice that depending on the expected size of the target chemical space and the total amount of molecules, different ratios have been used