Upload Wizard: Difference between revisions

From DISI
Jump to navigation Jump to search
 
Line 25: Line 25:
* CSV/TSV format file
* CSV/TSV format file
Fields are the column name of the catalog:
Fields are the column name of the catalog:
   Mandatory: Smiles, ID fields
   Mandatory: Smiles, ID Field
   Optional: Price, Compound Name, CAS
   Optional: Price, Compound Name, CAS



Latest revision as of 18:01, 28 October 2019

Background

Upload Wizard is file-transfering website written by Chinzo and maintained by Khanh from Irwin Lab at UCSF. The aim for this website is to provide a easier way for ZINC vendors send catalogs updates on ZINC in oppose to sending us email in the past.

NOTE (09-6-2019): This is a beta release. Any suggestions and comments are greatly appreciated.

Version Releases

09-10-2019 : Beta v.1.0

Getting Started

Vendor must sign up to use Upload Wizard

Sign up

  1. Submit a form request to us by clicking on this link and choose "Sign Up" at the bottom. Please also tell us briefly about you and your company in 140 words or less.
  2. After you submitted your information, you will receive an email asking to confirm your email.
  3. Our admin will review your request and get back to you ASAP (Due to security reason, this step is important; it might take a bit of time, thank you for your patient).

Company Form

You are required to fill in Company Profile before uploading catalog

Instruction of filling in "Catalog Info" section:

* SDF format file

Sdf catalog info.png

  • CSV/TSV format file

Fields are the column name of the catalog:

 Mandatory: Smiles, ID Field
 Optional: Price, Compound Name, CAS

Upload instructions

Catalog File Splitting Instruction

alt upload page
  • Catalog Type:

We categorize compounds into 5 main groups:

- Screening Compounds

- Building Blocks

- Natural Products

- Bioactives

- Mixed

  • Availability Type:

In order to automate the current catalog update pipeline, each catalog ought to be specified when upload if it is a in-stock or on-demand. This information is important when we organize molecules on ZINC by tranches for Large-Scaled Docking.

  • Upload Type:

Please specify if catalog is full update or incremental update for their upload procedure for updating catalog are different.

File Format Supported

Upload Wizard is current supporting catalog formats:

  • SDF - Scientific Data File
  • CSV/TSV - Delimiter Separated File
  • SMI/TXT - Text File format (with only smiles and product code, great for large catalog update)
READ ME: Special instruction for SMI/TXT catalog
Files in SMI and TXT format must follow this format:
<smiles_code> <product_ID>

'Excel file formats are not yet supported, but they can be conn

Example
CCN1CCC(n2cc(CNc3cc(Cl)c4ncc(C#N)c(Nc5ccc(F)c(Cl)c5)c4c3)nn2)CC1 ZINC000042921365
N#Cc1cnc2c(Cl)cc(NCc3nnn[nH]3)cc2c1Nc1ccc(F)c(Cl)c1 ZINC000014977426
N#Cc1cnc2cnc(NCCN3CCOCC3)cc2c1Nc1ccc(Cc2ccccc2)cc1 ZINC000028529865
N#Cc1cnc2c(Cl)cc(NCc3cn(Cc4ccccn4)nn3)cc2c1Nc1ccc(F)c(Cl)c1 ZINC000049881923
CN1CCC(n2cc(CNc3cc(Cl)c4ncc(C#N)c(Nc5ccc(F)c(Cl)c5)c4c3)nn2)CC1 ZINC000049881757
Cc1[nH]cnc1CNc1ccc2ncc(C#N)c(Nc3ccc(F)c(Cl)c3)c2c1 ZINC000028604186
Cc1c(CNc2ccc3ncc(C#N)c(Nc4ccc(F)c(Cl)c4)c3c2)ncn1C ZINC000028604188

Excel file formats are not supported yet. In order to work around this, please Save file as Window Formatted TSV or CSV file

Limit File Size

Upload up to 1GB file per upload. That is approximately:

  • ~18M smiles on a TXT/SMI file
  • ~500k on an SDF file

Pipeline

Semi automatic scheme of Upload Wizard

Upload wizard pipeline.png

Future Improment

  • Automate the filtering procedure.

Admin Instruction

  • Something here