ZINC22:Directory structure
Jump to navigation
Jump to search
ZINC is organized by heavy atom count (HAC), lipophilicity (calculated LogP), charge (from -4 to +4), and format (Mol2, SDF, PDBQT, and DB2).
Files contain up to 5000 molecules in a compressed tarballs (.tgz).
The data is organized into sub-directories to make the files more managable, as follows.
The files are organized as follows /zinc-22<layer>/H??/H??[PM]???/[a-z]/H??[PM]???-<charge>.*.<format>.tgzwhere:
- layer is a single letter, a-z, used only to create the database in parallel.
- H?? is the heavy atom count, and currently ranges from H04 to H29
- H??[PM]??? is the HAC/LogP bin. For instance, H23P130 contains molecules with 23 heavy atoms and RDKit calculated logP between 1.300 and 1.399.
- And H25M000 is molecules with 25 heavy atoms and logP between 0.00 and -0.999. P =plus and M=minus logP
- Within the tranche, a-z is a hash to limit the number of files per directory
- charge is the net molecular charge following the InChiKey convention, thus N=0, M=-1, O=+2 and so on beween -4 and +4.
- format is one of mol2, sdf, pdbqt, db2.