AWS:Set up account: Difference between revisions

From DISI
Jump to navigation Jump to search
No edit summary
 
(38 intermediate revisions by 2 users not shown)
Line 1: Line 1:
[[Category:AWS DOCK]]
* Tutorial 1  AWS:Set up account THIS TUTORIAL
* Tutorial 2: [[AWS:Upload files for docking]]
* Tutorial 3: [[AWS:Submit docking job]]
* Tutorial 4: [[AWS:Merge and download results]]
* Tutorial 5: [[AWS:Cleanup]]
 


= Installation =
= Installation =


Docker is required to run the aws-setup scripts. https://www.docker.com/get-started/
Docker is required to run the aws-setup scripts. https://www.docker.com/get-started/. You can install docker desktop to your personal machine, or log on to a machine where docker is already installed.


An Amazon AWS account is also required, with payment attached. https://aws.amazon.com/premiumsupport/knowledge-center/create-and-activate-aws-account/
An Amazon AWS account is also required, with payment attached. https://aws.amazon.com/premiumsupport/knowledge-center/create-and-activate-aws-account/
Line 10: Line 15:


  <nowiki>
  <nowiki>
docker pull btingle/aws-setup
docker pull dockingorg/aws-setup
docker run -v /var/run/docker.sock:/var/run/docker.sock -it btingle/aws-setup</nowiki>
docker run -v /var/run/docker.sock:/var/run/docker.sock --rm -it dockingorg/aws-setup</nowiki>
 
Explanation of arguments:
* <code>-v /var/run/docker.sock:/var/run/docker.sock</code> Allows the container to use your system's Docker
* <code>--rm</code> Cleans up the container once you've exited
* <code>-it</code> Runs the container interactively


It may be necessary to give the container additional privileges. When you enter the image, test this with the following command:
It may be necessary to give the container additional privileges. When you enter the image, test this with the following command:
Line 21: Line 31:


  <nowiki>
  <nowiki>
docker run --privileged -v /var/run/docker.sock:/var/run/docker.sock -it btingle/aws-setup</nowiki>
docker run --privileged --rm -v /var/run/docker.sock:/var/run/docker.sock -it dockingorg/aws-setup</nowiki>
 
If you're using a remote docker instance through the DOCKER_HOST environment variable, for example on windows WSL, you can use the following script in place of 'docker run':
 
<nowiki>
host=$(basename $DOCKER_HOST | cut -d':' -f1)
port=$(basename $DOCKER_HOST | cut -d':' -f2)
prot=$(dirname $DOCKER_HOST)
 
if [ "$host" = "localhost" ] || [ "$host" == "127.0.0.1" ]; then
host=host.docker.internal
fi


# essentially we are just forwarding the DOCKER_HOST information to the container (making sure to use host.docker.internal if DOCKER_HOST is localhost)
[[File:Step1-docktut-again.png|x300px|Example session depicting pulling the aws-setup image, running it, authenticating with AWS, and initializing the account.]]
docker run --env DOCKER_HOST=$prot//$host:$port -it btingle/awsdock-setup</nowiki>


= Container Environment =
= Container Environment =


The container has a barebones software installation, with some additional utilities to help out. curl and vi are installed so you can download files and edit them, for example if you have a custom config file you want to download from pastebin. You can also install whatever software you like using "apt install", e.g "apt install git".
The container uses the ubuntu distribution. Some utilities such as curl and vi are installed so you can download files and edit them. You can also install whatever software you like using "apt install", e.g "apt install git".


If you have files you'd like to access from the container, you can link them in using the docker "-v" option. By default we link the docker socket using this option ("-v /var/run/docker.sock:/var/run/docker.sock"), but you can link any number of directories or files in this manner. For example, if you would like the contents of the "/tmp" directory on your local machine to be available under "/temp" in the docker image, you would add the following option to your "docker run" command: "-v /tmp:/temp", for a final command of:
If you have files you'd like to access from the container, you can link them in using the docker "-v" option. By default we link the docker socket using this option ("-v /var/run/docker.sock:/var/run/docker.sock"), but you can link any number of directories or files in this manner. For example, if you would like the contents of the "/tmp" directory on your local machine to be available under "/temp" in the docker image, you would add the following option to your "docker run" command: "-v /tmp:/temp", for a final command of:


  <nowiki>docker run -v /tmp:/temp -v /var/run/docker.sock:/var/run/docker.sock -it btingle/aws-setup:latest</nowiki>
  <nowiki>docker run -v /tmp:/temp -v /var/run/docker.sock:/var/run/docker.sock --rm -it dockingorg/aws-setup:latest</nowiki>


If you're an advanced user and you'd like to create your own version of the aws-setup image with certain software preinstalled, you can request us for access to the aws-setup repository, which contains the scripts and Dockerfile we use to set up the docker image. You can also build your own image using our aws-setup image as a base.
If you're an advanced user and you'd like to create your own version of the aws-setup image with certain software preinstalled, you can request us for access to the aws-setup repository, which contains the scripts and Dockerfile we use to set up the docker image. You can also build your own image using our aws-setup image as a base.


= Creating your First AWS Environment =
= Quickstart - Creating your first AWS docking environment =


When you enter the docker image, you will be in /home/awsuser. There should be two directories in front of you, aws-setup and awsdock. We start off by going into the aws-setup directory and configuring our AWS credentials.
== Setup ==
 
=== Credentials & Region ===
 
When you enter the docker image, you will be in /home/awsuser. There should be two directories in front of you, aws-setup and awsdock. We start off by going into the aws-setup directory and configuring our AWS credentials. (This needs to be done every time you log in to the container)


  <nowiki>
  <nowiki>
Line 55: Line 57:
root@f54f423d64b1:/home/awsuser# aws configure</nowiki>
root@f54f423d64b1:/home/awsuser# aws configure</nowiki>


You'll now be prompted to enter your AWS access key ID & AWS secret access key. If you already know what these are you can enter them and move on. Don't bother setting the output format, but do set your desired region code, ideally one close to your actual geographic location. More info on regions & region codes here: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html
You'll now be prompted to enter your AWS access key ID & AWS secret access key. If you already know what these are you can enter them and move on. If you don't know what your AWS secret key and access key are, follow this tutorial: https://aws.amazon.com/blogs/security/wheres-my-secret-access-key/.
Make sure to save your keys somewhere safe that you will remember!!


If you don't know what your AWS secret key and access key are, follow this tutorial: https://aws.amazon.com/blogs/security/wheres-my-secret-access-key/.
Next, you'll be prompted on which AWS region you would like to use. If this is your first environment, set the region to us-east-1. Our lab's molecule data S3 bucket (zinc3d) is also located in this region, so this is the most economical region to run docking jobs in, due to the cost of moving data between AWS regions. (see diagram)


Make sure to save your keys somewhere safe that you will remember!!
[[File:S3pricing.png|thumb|Diagram showing the cost of transferring S3 data between regions and across to the internet]]


If it is your first time setting up an environment on your AWS account, you will need to run initialize-aws-batch.bash. This script only needs to be run once per account.
More info on regions & region codes here: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html


<nowiki>
The last prompt sets the preferred output format- feel free to leave this blank, or set it to "json".
root@f54f423d64b1:/home/awsuser/aws-setup# bash initialize-aws-batch.bash</nowiki>


You should see this script spit out a bunch of JSON text. If you accidentally run this script when it has already been run before, you will see a bunch of errors along the lines of: "Service role name <blank> has been taken in this account". Don't worry about these, they don't mean anything. Any other errors you should report to me @ ben@tingle.org, I'll find you a solution.
=== S3 Bucket ===


Before setting up your environment, you will want to create an S3 bucket to store your data. You can accomplish this using the command-line cli (stands for "common language interface") like so:
An S3 bucket is a virtual hard drive that your AWS resources can access from anywhere. You will need to create one on your account prior to creating your AWS environment. Follow the amazon tutorial on how to do this: https://docs.aws.amazon.com/AmazonS3/latest/userguide/create-bucket-overview.html


<nowiki>
<b>The quickstart guide will show you how to create an AWS environment in us-east-1, so it is best to create your S3 bucket in this region.</b>  
root@f54f423d64b1:/home/awsuser/aws-setup# aws s3api create-bucket --bucket <<name>></nowiki>


Alternatively you can follow the amazon guide to creating an s3 container for more transparency on how s3 buckets work: https://docs.aws.amazon.com/AmazonS3/latest/userguide/creating-bucket.html
It is best to have a dedicated S3 bucket for each region you create an environment for, due to the cost of inter-region data transfer.


If the CLI gives some weird error when you try to create a bucket, try using the s3 console instead: https://s3.console.aws.amazon.com/s3
[[File:Awsdock-Step3-flowchart.png|thumb|Diagram explaining how to create an S3 bucket. Note the region- N.Virginia aka us-east-1. This is the optimal region for running docking.]]


It is recommended, but not required, to create a separate bucket for each region-specific environment you create. The reasoning behind this is that S3 data transfer is free if you're transferring data to the same region the bucket is located. See https://aws.amazon.com/s3/pricing/ for more details.
=== First time setup ===
[[File:S3pricing.png|thumb|Diagram showing the cost of S3 data transfer between the AWS network, intra AWS regions, and the wider internet.]]


Now we move on to creating the environment. If it is your first time creating an environment, you can use the included awsdock.config for an interactive environment creation experience.
If it is your first time setting up an environment on your AWS account, you will need to run initialize-aws-batch.bash. This script only needs to be run once per account.


  <nowiki>
  <nowiki>
root@f54f423d64b1:/home/awsuser/aws-setup# bash create-aws-batch-env.bash /home/awsuser/awsdock/aws-setup-configs/awsdock.config</nowiki>
root@f54f423d64b1:/home/awsuser/aws-setup# bash initialize-aws-batch.bash</nowiki>


Enter the desired name for your environment.  
You should see this script spit out a bunch of JSON text. If you accidentally run this script when it has already been run before, you may see a bunch of errors along the lines of: "Service role name <blank> has been taken in this account". Don't worry about these, they don't mean anything.


<nowiki>What would you like this environment to be called? [default: "dockenv"]: <your env name here></nowiki>
== Environment Creation ==


Enter the aws region for your environment
  <nowiki>
<nowiki>Which region is this environment based in? [default: us-west-1]: <your region here></nowiki>
root@f54f423d64b1:/home/awsuser/aws-setup# bash create-aws-batch-env.bash /home/awsuser/awsdock/aws-setup-configs/awsdock_quickstart.config</nowiki>
 
The name + region will serve as the unique identifier for this environment, e.g "mydockenv-us-west-1". You will refer to this identifier when submitting jobs through a script.
 
Attach the bucket you created to the environment
  <nowiki>What bucket would you like to attach to this environment? <your bucket></nowiki>
 
Set the I/O policy for your bucket to input,output.
<nowiki>Which action(s) would you like to perform on this bucket? Choose from input/output, if using multiple separate by commas. input,output</nowiki>
 
Confirm that you will use the btingle/dockaws image for this environment.
<nowiki>What docker image should this environment use? [default: btingle/dockaws:latest]: btingle/dockaws:latest</nowiki>
 
Set MAX_CPUS for your environment to desired value. This parameter refers to the maximum number of jobs that can be run in parallel. If you want to run jobs at as large a scale as possible, give this a high value, e.g 2000.
<nowiki>How many CPUS would you like to allocate to this environment at maximum? [default: None]:</nowiki>
 
Set BID_PERCENTAGE for your environment to desired value. See "Advanced Usage" section for more explanation of this parameter. If you're not sure, keep the default.
<nowiki>What is your bid percentage threshold for spot instances? See the docs for more info on this parameter. [default: 100]: <your value></nowiki>
 
All done!
 
== Dealing With Errors ==
 
If you got an error while setting up your environment with create-aws-batch-env.bash, it may help to run the script again, making sure to provide the exact same parameters.
 
= Advanced Usage/Configuration Usage =
 
If you followed the tutorial above, you may have noticed that the create-aws-batch-env.bash script was provided a file, awsdock.config, as an argument. This .config file contains most of the parameters needed to set up a docking environment, with certain parameters left blank to be set interactively by the user. An example of a complete .config file is included next to awsdock.config, called awsdock1.config.
 
The benefit of using configuration files instead of the interactive prompt is mostly organizational. Configuration files allow you to keep track of which environments you've created and what parameters were provided to them. Additionally, if your configuration is complete, i.e nothing needs to be set interactively, then the create-aws-batch-env.bash script will run non-interactively, allowing for automatic configuration and deployment of environments.
 
In the next section, I will be describing what the various .config file parameters mean. The numbers in this section refer to the steps in the procedure followed by create-aws-batch-env.bash.
 
'''0.''' is for setting up the environment's NAME and REGION. These two properties in combination serve as the unique identifier for your batch environment.
 
* Your environment's NAME can be set by the ENV_NAME configuration variable, otherwise set interactively.
 
* Likewise, the REGION can be set by the ENV_AWS_REGION configuration variable, otherwise set interactively.
 
'''1.''' sets up the profile/role that will be used in this environment. This part will always be non-interactive, so this step is mostly here as a debug tool/progress marker.
 
'''2.''' sets up bucket policies for the environment. For example, if you would like to pull DB2 data from bkslab's zinc3d bucket, you would set that up here.
 
* You can set bucket policies using the ENV_BUCKET_CONFIGS configuration variable. Multiple buckets+policies can be included in this variable, for example:
 
** ENV_BUCKET_POLICIES="zinc3d:input mybucket:input,output" will set up bkslab's zinc3d bucket as the input source for the environment and mybucket as the output destination. Note that when using the awsdock image, output buckets need to be configured for input as well, since progress marker information is stored in the output destination, so it may need to be read as input later.
 
** ENV_BUCKET_POLICIES="zinc3d:input prompt" will set up zinc3d as the input source, and one other bucket policy through interactive prompt. This is the default setting of awsdock.config.
 
** ENV_BUCKET_POLICIES="zinc3d:input mybucket1:input,output mybucket2:output" sets up 3 policies- indicating that zinc3d can be used for input, mybucket1 can be used for input/output, and mybucket2 can be used for just output.
 
** ENV_BUCKET_POLICIES="prompt+" Will set up as many bucket policies as desired through interactive prompt.


'''3.''' sets up the docker image to be used in this environment. The name of the image is set by the JOB_IMAGE config variable.
The quickstart configuration will name your environment "dockenv-us-east-1". This name serves as the unique identifier for this environment, you'll refer to it later when submitting jobs. If you try to create an environment that already exists with the same name, the script will update the existing environment instead of creating a new one.
* In awsdock.config, this is bkslab's btingle/awsdock image. This image is compatible with our script for submitting DOCK jobs to AWS.


* JOB_IMAGE can be set to any image you so desire, but keep in mind that our DOCK job submission/collection scripts are specialized to the input/output format our awsdock image uses.
<b>If you would like to set up an environment with a different name or based in a region other than us-east-1, you can use aws-setup-configs/awsdock.config instead.</b>


'''4.''' creates all the AWS components needed to run jobs with AWS batch, including Job Queue, Compute Environment, and Job definition. The MAX_CPUS and BID_PERCENTAGE parameters are used/set during this step. The RETRY_STRATEGY and JOB_JSON_CONFIG parameters will also be used.
Attach the bucket you created to the environment. Don't qualify this with the s3:// path, just the plain name.


* MAX_CPUS is the maximum number of virtual cpus (vCPUs) allocated at any one time by AWS batch for this environment. The default JOB_JSON_CONFIG assigns one vcpu to one job, so a MAX_CPUS value of 100 means that a maximum of 100 jobs can run at any one given time.
[[File:Step6.png|none|x150px]]


* BID_PERCENTAGE is a value between (0, 100) and tells AWS what % of the on-demand price you are willing to pay for compute resources. A lower BID_PERCENTAGE value will result in better economy for your jobs, but also lower potential allocation.
Set MAX_CPUS for your environment to desired value. This parameter refers to the maximum number of jobs that can be run in parallel. You should set this at or below the suggested value- this value is derived from the AWS imposed resource limit. You can learn more about resource limits and how to increase them at this page: [[Docking_Submission_On_AWS#Resource_Limits]]


** A BID_PERCENTAGE value of 100 will still save money over the on-demand price in many cases, but if demand is high it will pay full price.
Set BID_PERCENTAGE for your environment to desired value. See section below for more explanation of this parameter, it can potentially save you money. If you're not sure, keep the default.


** A BID_PERCENTAGE value of 50 will always save money, however during periods of high demand you may struggle to allocate machines for your jobs.
[[File:Stepwhatever2.png|x119px|Prompts where you will set MAX_CPUS and BID_PERCENTAGE are highlighted]]


* RETRY_STRATEGY tells aws batch how many times to retry a failed job before giving up. By default awsdock.config tells the job to retry 5 times before giving up. Because we use the economical "spot fleet" strategy, jobs may fail because the machine they are on was allocated to a higher paying customer.
=== Bid Percentage ===


** Our awsdock image will save progress when spot fleet re-allocates the machine it is running on. This way no compute time is wasted.
In order to save money, our AWS batch environment uses the "spot" allocation strategy, which allows us to bid on compute resources at a discount.


* JOB_JSON_CONFIG is a json structure specifying the resource parameters jobs will use. These values can be changed, for example if you would like jobs to have more memory you can change the "memory" attribute to a higher value.
The BID_PERCENTAGE parameter indicates what % of the on-demand price our environment is willing to pay for compute resources. At 50%, the environment will wait for at least a 50% discount of the on-demand price to be available before purchasing resources. At 100%, the environment will pay lower prices when they're available, but failing that will pay the full on-demand price. This is the best option for those that want to save money but also don't want to waste time.


** If you are using our awsdock image, you should not need to change any of these parameters.
== Advanced Usage ==


= Bid Percentage =
For advanced usage of the aws-setup tool, see here: [[AWS DOCK Environment Setup Advanced Usage]]


In order to use resources efficiently, our AWS environment uses AWS spot instances to buy compute resources. AWS spot instances basically allow us to purchase compute resources for a fraction of the price, with the caveat that service may be interrupted at any time. Our AWS docking image allows us to take advantage of this service by saving progress whenever the instance is about to be interrupted. The bid percentage parameter indicates what % of the on-demand price we are willing to pay for compute resources. If left at 100, the scheduler will pay the on-demand price for compute resources if no spot instances are available.
[[Category:AWS]]
[[Category:DOCK 3.8]]
[[Category:Tutorial]]

Latest revision as of 20:38, 19 October 2022


Installation

Docker is required to run the aws-setup scripts. https://www.docker.com/get-started/. You can install docker desktop to your personal machine, or log on to a machine where docker is already installed.

An Amazon AWS account is also required, with payment attached. https://aws.amazon.com/premiumsupport/knowledge-center/create-and-activate-aws-account/

On a linux/mac/windows computer with docker or docker desktop installed, run the following commands in a terminal:

docker pull dockingorg/aws-setup
docker run -v /var/run/docker.sock:/var/run/docker.sock --rm -it dockingorg/aws-setup

Explanation of arguments:

  • -v /var/run/docker.sock:/var/run/docker.sock Allows the container to use your system's Docker
  • --rm Cleans up the container once you've exited
  • -it Runs the container interactively

It may be necessary to give the container additional privileges. When you enter the image, test this with the following command:

root@f54f423d64b1:/home/awsuser# docker ps

If you get a permission denied error, exit the container and run again with the --privileged option enabled:

docker run --privileged --rm -v /var/run/docker.sock:/var/run/docker.sock -it dockingorg/aws-setup

Example session depicting pulling the aws-setup image, running it, authenticating with AWS, and initializing the account.

Container Environment

The container uses the ubuntu distribution. Some utilities such as curl and vi are installed so you can download files and edit them. You can also install whatever software you like using "apt install", e.g "apt install git".

If you have files you'd like to access from the container, you can link them in using the docker "-v" option. By default we link the docker socket using this option ("-v /var/run/docker.sock:/var/run/docker.sock"), but you can link any number of directories or files in this manner. For example, if you would like the contents of the "/tmp" directory on your local machine to be available under "/temp" in the docker image, you would add the following option to your "docker run" command: "-v /tmp:/temp", for a final command of:

docker run -v /tmp:/temp -v /var/run/docker.sock:/var/run/docker.sock --rm -it dockingorg/aws-setup:latest

If you're an advanced user and you'd like to create your own version of the aws-setup image with certain software preinstalled, you can request us for access to the aws-setup repository, which contains the scripts and Dockerfile we use to set up the docker image. You can also build your own image using our aws-setup image as a base.

Quickstart - Creating your first AWS docking environment

Setup

Credentials & Region

When you enter the docker image, you will be in /home/awsuser. There should be two directories in front of you, aws-setup and awsdock. We start off by going into the aws-setup directory and configuring our AWS credentials. (This needs to be done every time you log in to the container)

root@f54f423d64b1:/home/awsuser# cd aws-setup
root@f54f423d64b1:/home/awsuser# aws configure

You'll now be prompted to enter your AWS access key ID & AWS secret access key. If you already know what these are you can enter them and move on. If you don't know what your AWS secret key and access key are, follow this tutorial: https://aws.amazon.com/blogs/security/wheres-my-secret-access-key/. Make sure to save your keys somewhere safe that you will remember!!

Next, you'll be prompted on which AWS region you would like to use. If this is your first environment, set the region to us-east-1. Our lab's molecule data S3 bucket (zinc3d) is also located in this region, so this is the most economical region to run docking jobs in, due to the cost of moving data between AWS regions. (see diagram)

Diagram showing the cost of transferring S3 data between regions and across to the internet

More info on regions & region codes here: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html

The last prompt sets the preferred output format- feel free to leave this blank, or set it to "json".

S3 Bucket

An S3 bucket is a virtual hard drive that your AWS resources can access from anywhere. You will need to create one on your account prior to creating your AWS environment. Follow the amazon tutorial on how to do this: https://docs.aws.amazon.com/AmazonS3/latest/userguide/create-bucket-overview.html

The quickstart guide will show you how to create an AWS environment in us-east-1, so it is best to create your S3 bucket in this region.

It is best to have a dedicated S3 bucket for each region you create an environment for, due to the cost of inter-region data transfer.

Diagram explaining how to create an S3 bucket. Note the region- N.Virginia aka us-east-1. This is the optimal region for running docking.

First time setup

If it is your first time setting up an environment on your AWS account, you will need to run initialize-aws-batch.bash. This script only needs to be run once per account.

root@f54f423d64b1:/home/awsuser/aws-setup# bash initialize-aws-batch.bash

You should see this script spit out a bunch of JSON text. If you accidentally run this script when it has already been run before, you may see a bunch of errors along the lines of: "Service role name <blank> has been taken in this account". Don't worry about these, they don't mean anything.

Environment Creation

root@f54f423d64b1:/home/awsuser/aws-setup# bash create-aws-batch-env.bash /home/awsuser/awsdock/aws-setup-configs/awsdock_quickstart.config

The quickstart configuration will name your environment "dockenv-us-east-1". This name serves as the unique identifier for this environment, you'll refer to it later when submitting jobs. If you try to create an environment that already exists with the same name, the script will update the existing environment instead of creating a new one.

If you would like to set up an environment with a different name or based in a region other than us-east-1, you can use aws-setup-configs/awsdock.config instead.

Attach the bucket you created to the environment. Don't qualify this with the s3:// path, just the plain name.

Step6.png

Set MAX_CPUS for your environment to desired value. This parameter refers to the maximum number of jobs that can be run in parallel. You should set this at or below the suggested value- this value is derived from the AWS imposed resource limit. You can learn more about resource limits and how to increase them at this page: Docking_Submission_On_AWS#Resource_Limits

Set BID_PERCENTAGE for your environment to desired value. See section below for more explanation of this parameter, it can potentially save you money. If you're not sure, keep the default.

Prompts where you will set MAX_CPUS and BID_PERCENTAGE are highlighted

Bid Percentage

In order to save money, our AWS batch environment uses the "spot" allocation strategy, which allows us to bid on compute resources at a discount.

The BID_PERCENTAGE parameter indicates what % of the on-demand price our environment is willing to pay for compute resources. At 50%, the environment will wait for at least a 50% discount of the on-demand price to be available before purchasing resources. At 100%, the environment will pay lower prices when they're available, but failing that will pay the full on-demand price. This is the best option for those that want to save money but also don't want to waste time.

Advanced Usage

For advanced usage of the aws-setup tool, see here: AWS DOCK Environment Setup Advanced Usage