Search zinc22.py: Difference between revisions
Line 36: | Line 36: | ||
If you get an error along the lines of "ImportError: /lib64/libc.so.6: version `GLIBC_2.14' not found" after sourcing the environment and attempting to run the script, you can try using an alternative environment @ ~xyz/btingle/bin/2dload.testing/py36_psycopg2 | If you get an error along the lines of "ImportError: /lib64/libc.so.6: version `GLIBC_2.14' not found" after sourcing the environment and attempting to run the script, you can try using an alternative environment @ ~xyz/btingle/bin/2dload.testing/py36_psycopg2 | ||
This happens because the required software libraries are not installed on whichever machine you are logged in to. | |||
You can log into one of our development nodes (ssh user@n-1-17, ssh user@n-1-16, etc.) which are guaranteed to have the correct libraries and run it there. | |||
==== Usage w/ Bash on BKS cluster ==== | ==== Usage w/ Bash on BKS cluster ==== |
Revision as of 22:57, 10 June 2022
Description
usage: search_zinc22.py [-h] [--get-vendors] [--configuration-server-url CONFIGURATION_SERVER_URL] zinc_id_in results_out search for smiles by zinc22 id positional arguments: zinc_id_in file containing list of zinc ids to look up results_out destination file for output optional arguments: -h, --help show this help message and exit --get-vendors get vendor supplier codes associated with zinc id --configuration-server-url CONFIGURATION_SERVER_URL database containing configuration for zinc22 system
search_zinc22.py is a script for looking up zinc ids on the zinc22 system. The operation is simple- provide a file containing a list of zincids and a destination file to write to. The script will give you a progress bar as it searches the system. If a database is down, the script will let you know and continue gathering the results it can.
The output format is as follows:
SMILES ZINC_ID TRANCHE_NAME
With --get-vendors the output format looks like this:
SMILES ZINC_ID VENDOR_ID TRANCHE_NAME CATALOG
Meaning the script will find all vendor codes and smiles associated with the provided zinc ids.
Location
You can find the script and environment @ /nfs/soft/zinc22/search_zinc on the BKS cluster.
If you get an error along the lines of "ImportError: /lib64/libc.so.6: version `GLIBC_2.14' not found" after sourcing the environment and attempting to run the script, you can try using an alternative environment @ ~xyz/btingle/bin/2dload.testing/py36_psycopg2
This happens because the required software libraries are not installed on whichever machine you are logged in to.
You can log into one of our development nodes (ssh user@n-1-17, ssh user@n-1-16, etc.) which are guaranteed to have the correct libraries and run it there.
Usage w/ Bash on BKS cluster
source /nfs/soft/zinc22/search_zinc/env/bin/activate python /nfs/soft/zinc22/search_zinc/search_zinc22.py input_zinc_ids.txt output_zinc_ids.txt python /nfs/soft/zinc22/search_zinc/search_zinc22.py --get-vendors input_zinc_ids.txt output_vendor_ids.txt
Usage w/ Csh on BKS cluster
source /nfs/soft/zinc22/search_zinc/env/bin/activate.csh python /nfs/soft/zinc22/search_zinc/search_zinc22.py input_zinc_ids.txt output_zinc_ids.txt python /nfs/soft/zinc22/search_zinc/search_zinc22.py --get-vendors input_zinc_ids.txt output_vendor_ids.txt
Dealing with NULL
Sometimes a ZINC ID will fail to look up. This could be because a server is down (the script will notify you if this is the case), or because the ID is missing from the system for some reason. In this case, it may be helpful to separate the molecules that didn't look up from the molecules that did. You may want to save them for later when the servers come back online, or to run a deeper search with comb_legacy_files.py (more on this below).
How to:
[env]$ cat legitimate_ids.txt > input.txt [env]$ echo ZINCzz00ZZZZZZZZ >> input.txt [env]$ echo ZINCyy00AAAAAAAA >> input.txt [env]$ echo ZINCxx00BBBBBBBB >> input.txt [env]$ python search_zinc.py input.txt output.txt Searching Zinc22: |XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX| 100.0% 0.00s 23/23 complete! [env]$ grep "_null_" output.txt _null_ ZINCzz00ZZZZZZZZ H33P270 _null_ ZINCyy00AAAAAAAA H34P280 _null_ ZINCxx00BBBBBBBB H35P290
search_zinc.py will not omit IDs that don't look up from the output, instead it will return the zinc id with "_null_" in every other field. Therefore we can use grep to filter our results.
[env]$ grep "_null_" output.txt > missing.txt [env]$ grep -v "_null_" output.txt > found.txt
It should be very infrequent that ZINC IDs don't look up, but if this happens you can use the following script:
comb_legacy_files.py
python3 /mnt/nfs/home/xyz/btingle/bin/2dload.testing/utils-2d/tin/misc/comb_legacy_files.py [INPUT_ZINC_IDS_FILE]
You don't need to source any particular python 3 environment for this script, but the environment used for search_zinc22.py will work just fine here.
This script will comb through our deprecated files and attempt to locate your ZINC IDs there. This script will create a file called "result" in your current directory containing all the smiles found.
If after this you're STILL unable to find your zinc ids, you can send them to our development team and we will find them for you.
Email ben@tingle.org, ccing khtang015@gmail.com and josecastanon4@gmail.com. Include your missing file as an attachment.