AWS:Submit docking job: Difference between revisions

From DISI
Jump to navigation Jump to search
No edit summary
No edit summary
Line 1: Line 1:
[[Category:AWS DOCK]]
== Prerequisites ==
== Prerequisites ==



Revision as of 03:24, 22 June 2022


Prerequisites

  • Following from the previous requirement, you have the aws-setup docker image installed on your machine (we will continue using it in this tutorial)
  • Likewise, you will need to know your AWS access key, secret key, etc.

Copying data to Amazon S3

In the previous tutorial, you created an S3 bucket to set up your environment. S3 buckets act like virtual hard drives, with familiar operations like cp, ls, rm, and mv for storing & manipulating data.

In an AWS docking environment, S3 buckets are responsible for storing dockfiles, input, and output for docking runs. Typically, docking input is sourced from our lab's zinc3d bucket, which houses an enormous amount of db2 data for large docking screens, while dockfiles, output, and other configuration are saved to an environment-specific bucket. For custom screens, like DUDE-Z, input can be uploaded to & sourced from the environment-specific bucket instead.

Aws docking env.png

You can take one of two approaches to get your data into an S3 bucket; the first is less complicated and more intuitive, the second is more complicated but opens more avenues for automation.

Approach 1: Use the browser console

AWS allows uploading files to S3 directly from the browser. Navigate to the S3 console (https://s3.console.aws.amazon.com/s3), where there should be a list of all buckets that your account owns.

Click on the bucket you'd like to upload to, and navigate to the folder within that bucket you want to upload to[1]. You can upload folders and files as you wish through the interface.

You can click "Copy S3 Uri" in the top right of the interface to copy the full path of the directory or file you are viewing.

Approach 2: AWS CLI

The AWS CLI provides an interface to perform any conceivable operation on AWS resources, including S3 buckets. For example, the CLI command to copy one file from your local drive into an S3 bucket looks like this:

aws s3 cp myfile.txt s3://mybucket/mydir/myfile.txt

Note that myfile.txt was saved under mydir/myfile.txt on the S3 bucket. Directories are created implicitly in S3, meaning it is not necessary to "mkdir" to create a directory, simply including the directory in the file's path is enough.

For a practical example, say you want to copy your dockfiles to S3. Here's what that looks like:

aws s3 cp --recursive docking_params/5HT2A/dockfiles s3://mybucket/docking_runs/5HT2A/dockfiles

The --recursive argument behaves similarly to "cp -r", allowing you to upload the directory's contents with one command.

If you've been using the aws-setup container to access the AWS CLI, you may be wondering how to find your dockfiles, as your system's usual files are not visible from within the container.

You can link files on your system to the container using docker's "-v" argument. For example, say all your various docking parameters are located under /home/myuser/dockingstuff on your system, and you'd like them to be visible within the aws-setup container somewhere.

docker run --rm -it -v /var/run/docker.sock:/var/run/docker.sock -v /home/myuser/dockingstuff:/tmp/dockingstuff btingle/aws-setup:latest

That extra "-v" argument tells docker to make all files under /home/myuser/dockingstuff on your host system visible in the container under the /tmp/dockingstuff directory. Once we've entered the container, we can verify this is true by ls-ing the /tmp/dockingstuff directory:

root@65aa6738db54:/home/awsuser# ls /tmp/dockingstuff
5HT2A    something_else    README.txt    docking_is_cool.smi

Note- if you're a Mac user, there may be some pain with permissions during this step. Your Mac will want you to provide explicit permissions for docker to access the linked directory, here's the docker tutorial on how to fix this: https://docs.docker.com/desktop/mac/#file-sharing.

Running supersub.bash

This step takes place in the aws-setup container. Run aws configure on startup as per usual. If you'd like to avoid running configure on start-up every time, see the subsection below.

Requirements

  • Have dockfiles uploaded to S3, as well as the S3 object URL for the folder.
  • Have an input list uploaded to S3, and have the S3 object URL for that list.

The input list may need some explanation. This input list is expected to be a text file containing a list of S3 paths to db2.tgz files accessible by the environment you are submitting to. For example:

s3://zinc3d/zinc-22x/H17/H17P200/a/H17P200-N-xaa.db2.tgz
s3://zinc3d/zinc-22x/H17/H17P200/a/H17P200-Q-xaa.db2.tgz
s3://zinc3d/zinc-22x/H17/H17P200/a/H17P200-N-xab.db2.tgz

If you followed the quick start guide from the previous tutorial, objects in the zinc3d bucket will be accessible to your environment(s) by default.

Quick Start

Run supersub.bash without any arguments

cd /home/awsuser/awsdock/submit
bash supersub.bash

You'll be greeted by a prompt to enter the full name (or identifier) of your desired environment. If you just ran through the quickstart guide, and named your environment "dockaws" in the "us-west-1" region, your environment's identifier will be dockaws-us-west-1.

[ What is the full name ($name-$region) of the environment to submit to? ]:

Next, it will ask you to provide an S3 location to send output to. This should be an S3 URL to a folder in your environment-specific bucket; don't worry about creating the folder if it doesn't exist, it will be created automatically.

[ Which s3 location should output be sent to? ]: 

Enter a name for your job. Whatever you want it to be, just make sure it doesn't collide with the name of any other job in your S3 output folder.

[ What is the name for this batch job? ]: 

Now provide the dockfiles URL and input list URL you prepared beforehand:

[ Provide a location in s3 for the dockfiles being used for this run ]: 
[ Provide an s3 file location for the list of files to be evaluated by this run ]:

Think over your life decisions real quick and enter y to submit the job!

created 1 jobs for this batch, submit? [y/N]: 
  1. Interestingly, this interface allows one to explicitly create a folder, while the CLI does not.