AWS:Set up account: Difference between revisions

From DISI
Jump to navigation Jump to search
No edit summary
No edit summary
Line 42: Line 42:


= Running the Scripts =
= Running the Scripts =
When you enter the docker image, you will be in /tmp. cd to the /tmp/aws-setup directory to view all the scripts at your disposal.


== aws configure ==
== aws configure ==


Before you do anything, you must configure your AWS credentials. These can be retrieved from your account in AWS (instructions). Each time you enter the awsdock-setup docker image, run "aws configure". Enter your AWS access key, secret key, and region. Output format can be ignored.
Each time you enter the awsdock-setup docker image, run "aws configure". Enter your AWS access key and secret key, which can be retrieved from your AWS account (instructions). Region and output format can be set if desired, but will be overwritten by our scripts.


== initialize-aws-batch.bash ==
== initialize-aws-batch.bash ==

Revision as of 19:49, 20 January 2022

This page pertains to the scripts created by Benjamin Tingle to set up an environment for running DOCK workloads in AWS.

WIP: Not complete, just saving progress

Getting Started

Requirements:

All scripts for setting up docking with AWS are run in a docker container. (If you need to run bare-metal contact me @ ben@tingle.org)

Running with Docker Desktop for Windows

In cmd/powershell

1. docker pull btingle/awsdock-setup

2. docker run -v /var/run/docker.sock:/var/run/docker.sock -it btingle/awsdock-setup

Running with Docker Desktop + WSL2

On the docker desktop window, go to Settings->General and enable "Expose daemon on tcp://localhost:2375 without TLS"

Now in WSL2:

1. Install docker client (if not already installed)

2. export DOCKER_HOST=tcp://localhost:2375

3. docker pull btingle/awsdock-setup

4. bash run_docker.bash

Running with Linux/Mac

1. docker pull btingle/awsdock-setup

2. bash run_docker.bash

If you've set DOCKER_HOST already (i.e you are using a remote docker instance) the run_docker.bash script will still work.

Running the Scripts

When you enter the docker image, you will be in /tmp. cd to the /tmp/aws-setup directory to view all the scripts at your disposal.

aws configure

Each time you enter the awsdock-setup docker image, run "aws configure". Enter your AWS access key and secret key, which can be retrieved from your AWS account (instructions). Region and output format can be set if desired, but will be overwritten by our scripts.

initialize-aws-batch.bash

Usage:
  bash initialize-aws-batch-env.bash

A one-off script that is run before setting up environments. If it is your first time creating an awsdock environment on your aws account, run this script once. This will set up policies and roles on your AWS account that are needed across all batch environments.

create-aws-batch-env.bash

Usage:
  bash create-aws-batch-env.bash <config>

Main script for setting up awsdock environments. Should be run for each region you wish to run docking jobs in. You may also want to create separate environments for different versions of DOCK or alternate datasets. Depending on the configuration, this script can be interactive or automated by configuration variables. For more information on configuration variables for this script, see the "configuration" section of this page. Running this script with the included "awsdock.config" configuration file will provide an interactive experience for setting up your first awsdock environment.

Environment Creation Steps

Step {0} is for setting up the environment's NAME and REGION. These two properties in combination serve as the unique identifier for your batch environment.

  • Your environment's NAME can be set by the ENV_NAME configuration variable, otherwise set interactively.
  • Likewise, the REGION can be set by the ENV_AWS_REGION configuration variable, otherwise set interactively.

Step {1} sets up the profile/role that will be used in this environment. This part will always be non-interactive, so this step is mostly here as a debug tool/progress marker.

Step {2} sets up bucket policies for the environment. For example, if you would like to pull DB2 data from bkslab's zinc3d bucket, you would set that up here.

  • You can set bucket policies using the ENV_BUCKET_CONFIGS configuration variable. Multiple buckets+policies can be included in this variable, for example:
    • ENV_BUCKET_POLICIES="zinc3d:input mybucket:output" will set up bkslab's zinc3d bucket as the input source for the environment and mybucket as the output destination.
    • ENV_BUCKET_POLICIES="zinc3d:input prompt" will set up zinc3d as the input source, and one other bucket policy through interactive prompt. This is the default setting of awsdock.config.
    • ENV_BUCKET_POLICIES="zinc3d:input mybucket1:input,output mybucket2:output" sets up 3 policies- indicating that zinc3d can be used for input, mybucket1 can be used for input/output, and mybucket2 can be used for just output.
    • ENV_BUCKET_POLICIES="prompt+" Will set up as many bucket policies as desired through interactive prompt.

Step {3} sets up the docker image to be used in this environment. The name of the image is set by the JOB_IMAGE config variable.

  • In awsdock.config, this is bkslab's btingle/awsdock image. This image is compatible with our script for submitting DOCK jobs to AWS.
  • JOB_IMAGE can be set to any image you so desire, but keep in mind that our DOCK job submission/collection scripts are specialized to the input/output format our awsdock image uses.

Step {4} creates all the AWS components needed to run jobs with AWS batch, including Job Queue, Compute Environment, and Job definition. The MAX_CPUS and BID_PERCENTAGE parameters are used/set during this step. The RETRY_STRATEGY and JOB_JSON_CONFIG parameters will also be used.

  • MAX_CPUS is the maximum number of virtual cpus (vCPUs) allocated at any one time by AWS batch for this environment. The default JOB_JSON_CONFIG assigns one vcpu to one job, so a MAX_CPUS value of 100 means that a maximum of 100 jobs can run at any one given time.
  • BID_PERCENTAGE is a value between (0, 100) and tells AWS what % of the on-demand price you are willing to pay for compute resources. A lower BID_PERCENTAGE value will result in better economy for your jobs, but also lower potential allocation.
    • A BID_PERCENTAGE value of 100 will still save money over the on-demand price in many cases, but if demand is high it will pay full price.
    • A BID_PERCENTAGE value of 50 will always save money, however during periods of high demand you may struggle to allocate machines for your jobs.
  • RETRY_STRATEGY tells aws batch how many times to retry a failed job before giving up. By default awsdock.config tells the job to retry 5 times before giving up. Because we use the economical "spot fleet" strategy, jobs may fail because the machine they are on was allocated to a higher paying customer.
    • Our awsdock image will save progress when spot fleet re-allocates the machine it is running on. This way no compute time is wasted.
  • JOB_JSON_CONFIG is a json structure specifying the resource parameters jobs will use. These values can be changed, for example if you would like jobs to have more memory you can change the "memory" attribute to a higher value.
    • If you are using our awsdock image, you should not need to change any of these parameters.