AWS DOCK Environment Setup Advanced Usage

From DISI
Revision as of 06:06, 27 July 2022 by Btingle (talk | contribs) (Created page with "= Advanced Usage/Configuration Usage = If you followed the tutorial above, you may have noticed that the create-aws-batch-env.bash script was provided a file, awsdock.config,...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Advanced Usage/Configuration Usage

If you followed the tutorial above, you may have noticed that the create-aws-batch-env.bash script was provided a file, awsdock.config, as an argument. This .config file contains most of the parameters needed to set up a docking environment, with certain parameters left blank to be set interactively by the user. An example of a complete .config file is included next to awsdock.config, called awsdock1.config.

The benefit of using configuration files instead of the interactive prompt is mostly organizational. Configuration files allow you to keep track of which environments you've created and what parameters were provided to them. Additionally, if your configuration is complete, i.e nothing needs to be set interactively, then the create-aws-batch-env.bash script will run non-interactively, allowing for automatic configuration and deployment of environments.

In the next section, I will be describing what the various .config file parameters mean. The numbers in this section refer to the steps in the procedure followed by create-aws-batch-env.bash.

0. is for setting up the environment's NAME and REGION. These two properties in combination serve as the unique identifier for your batch environment.

  • Your environment's NAME can be set by the ENV_NAME configuration variable, otherwise set interactively.
  • Likewise, the REGION can be set by the ENV_AWS_REGION configuration variable, otherwise set interactively.

1. sets up the profile/role that will be used in this environment. This part will always be non-interactive, so this step is mostly here as a debug tool/progress marker.

2. sets up bucket policies for the environment. For example, if you would like to pull DB2 data from bkslab's zinc3d bucket, you would set that up here.

  • You can set bucket policies using the ENV_BUCKET_CONFIGS configuration variable. Multiple buckets+policies can be included in this variable, for example:
    • ENV_BUCKET_POLICIES="zinc3d:input mybucket:input,output" will set up bkslab's zinc3d bucket as the input source for the environment and mybucket as the output destination. Note that when using the awsdock image, output buckets need to be configured for input as well, since progress marker information is stored in the output destination, so it may need to be read as input later.
    • ENV_BUCKET_POLICIES="zinc3d:input prompt" will set up zinc3d as the input source, and one other bucket policy through interactive prompt. This is the default setting of awsdock.config.
    • ENV_BUCKET_POLICIES="zinc3d:input mybucket1:input,output mybucket2:output" sets up 3 policies- indicating that zinc3d can be used for input, mybucket1 can be used for input/output, and mybucket2 can be used for just output.
    • ENV_BUCKET_POLICIES="prompt+" Will set up as many bucket policies as desired through interactive prompt.

3. sets up the docker image to be used in this environment. The name of the image is set by the JOB_IMAGE config variable.

  • In awsdock.config, this is bkslab's btingle/awsdock image. This image is compatible with our script for submitting DOCK jobs to AWS.
  • JOB_IMAGE can be set to any image you so desire, but keep in mind that our DOCK job submission/collection scripts are specialized to the input/output format our awsdock image uses.

4. creates all the AWS components needed to run jobs with AWS batch, including Job Queue, Compute Environment, and Job definition. The MAX_CPUS and BID_PERCENTAGE parameters are used/set during this step. The RETRY_STRATEGY and JOB_JSON_CONFIG parameters will also be used.

  • MAX_CPUS is the maximum number of virtual cpus (vCPUs) allocated at any one time by AWS batch for this environment. The default JOB_JSON_CONFIG assigns one vcpu to one job, so a MAX_CPUS value of 100 means that a maximum of 100 jobs can run at any one given time.
  • BID_PERCENTAGE is a value between (0, 100) and tells AWS what % of the on-demand price you are willing to pay for compute resources. A lower BID_PERCENTAGE value will result in better economy for your jobs, but also lower potential allocation.
    • A BID_PERCENTAGE value of 100 will still save money over the on-demand price in many cases, but if demand is high it will pay full price.
    • A BID_PERCENTAGE value of 50 will always save money, however during periods of high demand you may struggle to allocate machines for your jobs.
  • RETRY_STRATEGY tells aws batch how many times to retry a failed job before giving up. By default awsdock.config tells the job to retry 5 times before giving up. Because we use the economical "spot fleet" strategy, jobs may fail because the machine they are on was allocated to a higher paying customer.
    • Our awsdock image will save progress when spot fleet re-allocates the machine it is running on. This way no compute time is wasted.
  • JOB_JSON_CONFIG is a json structure specifying the resource parameters jobs will use. These values can be changed, for example if you would like jobs to have more memory you can change the "memory" attribute to a higher value.
    • If you are using our awsdock image, you should not need to change any of these parameters.