How To Use Rsync: Difference between revisions

From DISI
Jump to navigation Jump to search
No edit summary
mNo edit summary
 
Line 19: Line 19:
* --include=”PATTERN”: includes files that are of the same pattern as “*.smi”
* --include=”PATTERN”: includes files that are of the same pattern as “*.smi”
* --exclude=”PATTERN”: excludes everything that isn’t “*.smi”
* --exclude=”PATTERN”: excludes everything that isn’t “*.smi”
You can check what files you can download by excluding the destination folder and removing the “include” and “exclude” options
You can check what files you can download by excluding the destination folder and removing the “include” and “exclude” options.
 
You can also use "--dry-run" to simulate what you're about to download.
 





Latest revision as of 20:37, 4 October 2024

New Notes

Functional Public Domain Names

  • rsync://files.docking.org
  • rsync://files2.docking.org

List Available Directories for Download

rsync --list-only rsync://files.docking.org

How to Download

For example, I would like to only download all of the smiles from ZINC15-2D and the destination for the files will be called zinc_test

rsync -L -a -m --progress --include="*.smi" --exclude="*.*" rsync://files.docking.org/ZINC15-2D zinc_test

Breaking down the options used:

  • -L: transform symbolic links into referent file/directory
  • -a: This switch puts rsync into archive mode, which preserves time stamps, performs a recursive copy, keeps all file and directory permissions, preserves owner and group information, and copies any symbolic links.
  • -m: Skips empty directories
  • --progress: Shows progress during file transfer
  • --include=”PATTERN”: includes files that are of the same pattern as “*.smi”
  • --exclude=”PATTERN”: excludes everything that isn’t “*.smi”

You can check what files you can download by excluding the destination folder and removing the “include” and “exclude” options.

You can also use "--dry-run" to simulate what you're about to download.


Old Notes

How To Download ZINC-22 Using Rsync

Ok, you can try this

rsync -Larv --include='*/'  --include='[a-z]/H[01]?*-*db2.tgz' --exclude='sets' --exclude='*' --verbose rsync://files.docking.org/ZINC22-3D/zinc-22<?> .

(all on one line)

where <?> is d g h i k l m n o p q r s t u v x 
n is 50% of the database
x is 25%
g is "informer set" 

This will get you all molecules in the "?" layer of ZINC-22. with the db2 format. If you want sdf, mol2 or pdbqt, just change db2 into the relevant one.

We recommended starting with < H20 (thus H[01]? above) . Once you have up to H19, add H20, H21 progressively. Each is typically 50% bigger than the previous one. H25 and H26 together are more than 60% of the database. You can do a lot of productive docking with H13-H16 (fragment-like) and H17-H19 (small lead like).

 

The layers have no meaning, other than they allow us to prepare the database independently in steps. If you want only a subset, then you could try using the 3D tranche browser in cartblanche22.docking.org to make a precise selection. 

I hope this helps.