How to Use SWAG

From DISI
Revision as of 04:44, 22 January 2026 by Zdingman (talk | contribs) (Created page. Documentation based on that in README.txt)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

SWAG is a python interface for the, enabling users to submit a list of SMILES formatted molecules for exhaustive analog searching.

SWAG produces two output files: output_ascores.csv and output_responses.csv. The first file contains each query molecule from the input, as well as the corresponding number of analogs found within the database. The second file lists every analog found for every query molecule, and includes the SMILES, name, and edit distance.

Usage Instructions

SWAG can be run in a working directory with the following commands:

 source /nfs/home/zdingman/environments/SWAG/bin/activate
 cp /nfs/home/zdingman/scripts/SWAG_v1-3-1/* .
 python SWAG.py [arguments]

SWAG has several requirements that must be specified at run time. These are:

  • The input smiles file, given with -f [YOUR_INPUT.smi]
  • The prefix for output files, given with -o [YOUR_OUTPUT]
  • The database for querying, given with -d [DATABASE_NAME]
  • The maximum distance, given with -p [NUMBER]

If we wanted to find all the compounds in ZINC22, within 4 edits of the compounds listed in our file ligands.smi, we would run the following command:

 python SWAG.py -f ligands.smi -o ligands -d ZINC22 -p [4]

Database Selection

The databases used by SmallWorld are (in most cases) pre-computed by the Irwin lab for our usage, and are hard-coded into the python environment we source. The following are available to query with SWAG:

  • 480K: 480 thousand building blocks from ChemSpace
  • Accessible-BB: 94 million building blocks from ChemSpace + Enamine FastMADE
  • REAL: 70 billion compounds from 2024 Enamine REAL Space
  • ZINC22: 96 billion commercial compounds listed in ZINC22
  • local: Checks the current directory for a local mapfile called 'local.anon.map'

Unlike the first four databases that refer to existing databases, specifying 'local' will search the current working directory for a map file. This is useful if you have a specific list of molecules that you want to search within. For example, you can search your top million docking vActives to find close analogs of your selected hits, or search a list of compounds from Infinisee/xREAL to find analogs by edit distance.

[An easy way for creating map files will be covered in a not-yet-written wiki page.]

Advanced Usage

By default, SWAG takes a single integer value for the distance parameter. However, you can also pass a list of 7 integers instead as [d,d,d,d,d,d,d] which corresponds to the maximum allowed number of edits for each of the following:

  • Maximum distance
  • Ring count additions
  • Ring count removals
  • Linker length additions
  • Linker length removals
  • Terminal group additions
  • Terminal group removals

SmallWorld will return similar compounds, beginning with the fewest number of ring/linker/terminal modifications, up to the maximum distance specified for both specific changes and overall distance.

Additional runtime flags are as follows:

--skipInputStandardization

--includeStereo

--excludeStereo

--storeQueryTime

By default, input SMILES are converted to the uncharged parent version of their largest species. Recommended to use input standardization. Otherwise, similar molecules may not be found due to differing counterions or protonation state.

By default, results found from SWAG will count differing stereoisomers as different compounds. If you would like to treat all stereoisomers as a single compound and remove them from the output, use the excludeStereo flag.

By default, SWAG does not write the per-query search time. If you would like to output the max_dist and query_time to ascores.csv, use the storeQueryTime flag.

Final Notes

SWAG is built using the Cython wrapper for the SmallWorld C++ API, which relies on the gnu c compiler (GCC) to run. If running SWAG.py returns "ImportError: CXXABI_1.3.9 not found" then you must use a machine with an updated GCC version. On the Shoichet cluster, these machines are epyc and epyc2.

For further details on SmallWorld, please see the official documentation: https://www.nextmovesoftware.com/downloads/smallworld/documentation/