ZINC22:Directory structure

From DISI
Jump to navigation Jump to search

ZINC is organized by heavy atom count (HAC), lipophilicity (calculated LogP), charge (from -4 to +4), and format (Mol2, SDF, PDBQT, and DB2).

Files contain up to 5000 molecules in a compressed tarballs (.tgz).

The data is organized into sub-directories to make the files more managable, as follows.

The files are organized as follows   /zinc-22<layer>/H??/H??[PM]???/[a-z]/H??[PM]???-<charge>.*.<format>.tgzwhere:

  • layer is a single letter, a-z, used only to create the database in parallel.
  • H?? is the heavy atom count, and currently ranges from H04 to H29
  • H??[PM]??? is the HAC/LogP bin.   For instance, H23P130 contains molecules with 23 heavy atoms and RDKit calculated logP between 1.300 and 1.399.
  • And H25M000 is molecules with 25 heavy atoms and logP between 0.00 and -0.999.   P =plus and M=minus logP
  • Within the tranche, a-z is a hash to limit the number of files per directory
  • charge is the net molecular charge following the InChiKey convention, thus N=0, M=-1, O=+2 and so on beween -4 and +4.
  • format is one of mol2, sdf, pdbqt, db2.