ZINC22:Checkout
This page describes checking out ZINC22 SMILES tranches for 3D building. It also describes how to assemble a library subset, some of which may not be built yet.
check_out_zinc22.bash [hlogp_range] [database generation]
Which will return you a file like this:
[hlogp_start]_[hlogp_end]_[generation].smi
Or throw some sort of warning/error if you check out molecules that have already been checked out. This could be implemented in some sort of tranche-browser-like web api. There would also be a log for each "transaction" that records number of mols checked out, who checked them out, and when.
For ying's case, where we're going to be checking out a specific subset of molecules, the process is a little different. After we've fetched db2s from /nfs/exd and trimmed down our list of molecules to be built, we want to check the remaining molecules out of /nfs/exb. We want all molecules to be built from the sample, so if molecules from a particular tranche have been checked out but not built yet (for whatever reason) we will still want to build them, not toss them out. We can mark these molecules separately.
So at the end of Ying's query, there will be 4 outputs, all of which will be organized into subdirectories of hlogp and generation:
found_db2s/ : existing db2s found not_found_smi/ : smiles that were not checked out found_not_built_smi/ : smiles that were already checked out not_in_library_smi/ : smiles that weren't found in our library
The results of building not_found_smi can be safely added in to the 3D database, the rest should be treated as separate from our database. Also, her molecules should be built in a separate query directory, to indicate that we're only building a subset of molecules from each tranche.
Checkout sheet if you want to participate in building ZINC-22
https://docs.google.com/spreadsheets/d/1152XEKP0AJN4ty03vE-Td6DcrENyGnFcZyfKGGiccAc/edit?usp=sharing