Using AWS Setup For Cloud Computation

Jump to navigation Jump to search

Introduction to the aws-setup platform

The concept of "the cloud" as a platform for computation is somewhat mystified. Scientists talk about running their calculations "in the cloud", but to what end? How does one even begin to use the cloud? We are broadly familiar with the academic cluster environment, but the transition from that environment to cloud computation is not often discussed in detail.

Enter aws-setup: a collection of scripts for configuring a virtual cluster environment in the AWS cloud. The power of these scripts lies in the deep knowledge they encapsulate of AWS systems.

There exist other solutions for easy creation of a cluster environment in the cloud, for example cluster-in-the-cloud, which allows one to create a SLURM-managed cluster on any of the major cloud providers' systems. The power of this is in the simplicity of transition- one can take a script written for use in an academic cluster and port it straight to the cloud! There are pitfalls to this approach- in attempting to cover up all the complexity of cloud systems, there is a risk of that complexity emerging in undesirable ways.

If attempting to set up your own cluster environment on the cloud is like rowing a paddleboat into a raging storm, then cluster-in-the-cloud is like taking a luxury cruise into a tsunami.

aws-setup, on the other hand, is like a submarine. Sure, there's no hot tub or open bar, but it will get you to where you want to go. What aws-setup promises is tight integration with AWS systems, robustness, efficiency, and relative simplicity of use (compared to setting things up oneself).

Translating Cluster Concepts to AWS Cloud Concepts

Understanding AWS in relation to a traditional cluster
Traditional Cluster Concept Corresponding AWS Concept Shared Properties
NFS S3 Bucket Storage Accessible from anywhere
CPU vCPU One Unit of computing power
Node EC2 Instance Machine (virtual or otherwise) with CPUs, memory, disk, etc.
Users/Groups Roles, Policies, IAM Users Security structures for limiting access to resources
Slurm/SGE AWS Batch Platform for scheduling compute jobs
Package Managers/Lmod Containers Used to create a particular software environment for scripts/pipelines
Novel AWS Concepts
Concept Description
Regions Geographic divisions of the AWS network. Resources in one region are typically not visible to those in other regions.

Batch Environments in aws-setup

The core concept in aws-setup is that of the "batch environment". A batch environment is the interface used to spin up a cluster and run a particular computation or set of computations. Each environment created with aws-setup has three core properties:

  1. <Region>
  2. <Container Image>
  3. <Attached S3 Buckets & Allowed operations>

In other words:

  1. which geographical location should we spin up the cluster in
  2. what software environment should this cluster use
  3. what data storage is available to this cluster, and how are we allowed to access that storage?

Example: DOCK Batch Environment

For example, the core properties of a batch environment for docking might look like this:

  1. us-east-1
  2. bkslab/awsdock:latest
  3. zinc3d:input, my_stuff:input,output

In other words:

  1. This environment spins up a cluster in the us-east-1 region
  2. We are using the bkslab/awsdock:latest docker image as our software environment
  3. This cluster is allowed to read from the zinc3d data bucket, and is allowed to read and write to the my_stuff data bucket.