Search zinc22.py: Difference between revisions

From DISI
Jump to navigation Jump to search
Line 47: Line 47:


  <nowiki>
  <nowiki>
[env] $ cat legitimate_ids.txt > input.txt
[env]$ cat legitimate_ids.txt > input.txt
[env] $ echo ZINCzz00ZZZZZZZZ >> input.txt
[env]$ echo ZINCzz00ZZZZZZZZ >> input.txt
[env] $ echo ZINCyy00AAAAAAAA >> input.txt
[env]$ echo ZINCyy00AAAAAAAA >> input.txt
[env] $ echo ZINCxx00BBBBBBBB >> input.txt
[env]$ echo ZINCxx00BBBBBBBB >> input.txt
[env] $ python search_zinc.py input.txt output.txt
[env]$ python search_zinc.py input.txt output.txt
provided a zinc id(s) that could not possibly exist! skipping!
provided a zinc id(s) that could not possibly exist! skipping!
Searching Zinc22:  |XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX| 100.0%  0.00s 23/23 complete!
Searching Zinc22:  |XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX| 100.0%  0.00s 23/23 complete!
[env] $ grep "_null_" output.txt
[env]$ grep "_null_" output.txt
_null_ ZINCzz00ZZZZZZZZ H33P270
_null_ ZINCzz00ZZZZZZZZ H33P270
_null_ ZINCyy00AAAAAAAA H34P280
_null_ ZINCyy00AAAAAAAA H34P280
Line 63: Line 63:


  <nowiki>
  <nowiki>
$ grep "_null_" output.txt > missing.txt
[env]$ grep "_null_" output.txt > missing.txt
$ grep -v "_null_" output.txt > found.txt</nowiki>
[env]$ grep -v "_null_" output.txt > found.txt</nowiki>


It should be very infrequent that ZINC IDs don't look up, but if this happens you can send missing IDs to our development team. Email ben@tingle.org, ccing khtang015@gmail.com and josecastanon4@gmail.com. Include your missing file as an attachment.
It should be very infrequent that ZINC IDs don't look up, but if this happens you can send missing IDs to our development team. Email ben@tingle.org, ccing khtang015@gmail.com and josecastanon4@gmail.com. Include your missing file as an attachment.

Revision as of 09:12, 9 June 2022

usage: search_zinc22.py [-h] [--get-vendors]
                        [--configuration-server-url CONFIGURATION_SERVER_URL]
                        zinc_id_in results_out

search for smiles by zinc22 id

positional arguments:
  zinc_id_in            file containing list of zinc ids to look up
  results_out           destination file for output

optional arguments:
  -h, --help            show this help message and exit
  --get-vendors         get vendor supplier codes associated with zinc id
  --configuration-server-url CONFIGURATION_SERVER_URL
                        database containing configuration for zinc22 system

search_zinc22.py is a script for looking up zinc ids on the zinc22 system. The operation is simple- provide a file containing a list of zincids and a destination file to write to. The script will give you a progress bar as it searches the system. If a database is down, the script will let you know and continue gathering the results it can.

The output format is as follows:

SMILES ZINC_ID TRANCHE_NAME

With --get-vendors the output format looks like this:

SMILES ZINC_ID VENDOR_ID TRANCHE_NAME CATALOG

Meaning the script will find all vendor codes and smiles associated with the provided zinc ids.

Usage w/ Bash on BKS cluster

source /nfs/soft/zinc22/search_zinc/env/bin/activate
python /nfs/soft/zinc22/search_zinc/search_zinc.py input_zinc_ids.txt output_zinc_ids.txt
python /nfs/soft/zinc22/search_zinc/search_zinc.py --get-vendors input_zinc_ids.txt output_vendor_ids.txt

Usage w/ Csh on BKS cluster

source /nfs/soft/zinc22/search_zinc/env/bin/activate.csh
python /nfs/soft/zinc22/search_zinc/search_zinc.py input_zinc_ids.txt output_zinc_ids.txt
python /nfs/soft/zinc22/search_zinc/search_zinc.py --get-vendors input_zinc_ids.txt output_vendor_ids.txt

Dealing with NULL

Sometimes a ZINC ID will fail to look up. This could be because a server is down (the script will notify you if this is the case), or because the ID is missing from the system for some reason. In this case, it may be helpful to separate the molecules that didn't look up from the molecules that did. You may want to save them for later when the servers come back online, or to email to the development team so we can find them for you.

How to:

[env]$ cat legitimate_ids.txt > input.txt
[env]$ echo ZINCzz00ZZZZZZZZ >> input.txt
[env]$ echo ZINCyy00AAAAAAAA >> input.txt
[env]$ echo ZINCxx00BBBBBBBB >> input.txt
[env]$ python search_zinc.py input.txt output.txt
provided a zinc id(s) that could not possibly exist! skipping!
Searching Zinc22:  |XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX| 100.0%  0.00s 23/23 complete!
[env]$ grep "_null_" output.txt
_null_ ZINCzz00ZZZZZZZZ H33P270
_null_ ZINCyy00AAAAAAAA H34P280
_null_ ZINCxx00BBBBBBBB H35P290

search_zinc.py will not omit IDs that don't look up from the output, instead it will return the zinc id with "_null_" in every other field. Therefore we can use grep to filter our results.

[env]$ grep "_null_" output.txt > missing.txt
[env]$ grep -v "_null_" output.txt > found.txt

It should be very infrequent that ZINC IDs don't look up, but if this happens you can send missing IDs to our development team. Email ben@tingle.org, ccing khtang015@gmail.com and josecastanon4@gmail.com. Include your missing file as an attachment.