Using AWS Setup For Cloud Computation
Introduction to the aws-setup platform
The concept of "the cloud" as a platform for computation is somewhat mystified. Scientists talk about running their calculations "in the cloud", but to what end? How does one even begin to use the cloud? We are broadly familiar with the academic cluster environment, but the transition from that environment to cloud computation is not often discussed in detail.
Enter aws-setup: a collection of scripts for configuring a virtual cluster environment in the AWS cloud. The power of these scripts lies in the deep knowledge they encapsulate of AWS systems.
There exist other solutions for easy creation of a cluster environment in the cloud, for example cluster-in-the-cloud, which allows one to create a SLURM-managed cluster on any of the major cloud providers' systems. The power of this is in the simplicity of transition- one can take a script written for use in an academic cluster and port it straight to the cloud! There are pitfalls to this approach- in attempting to cover up all the complexity of cloud systems, there is a risk of that complexity emerging in undesirable ways.
If attempting to set up your own cluster environment on the cloud is like rowing a paddleboat into a raging storm, then cluster-in-the-cloud is like taking a luxury cruise into a tsunami.
aws-setup, on the other hand, is like a submarine. Sure, there's no hot tub or open bar, but it will get you to where you want to go. What aws-setup promises is tight integration with AWS systems, robustness, efficiency, and relative simplicity (compared to setting things up oneself).
Translating Cluster Concepts to AWS Cloud Concepts
Traditional Cluster Concept | Corresponding AWS Concept | Shared Properties |
---|---|---|
NFS | S3 Bucket | Storage Accessible from anywhere |
CPU | vCPU | One Unit of computing power |
Node | EC2 Instance | Machine (virtual or otherwise) with CPUs, memory, disk, etc. |
Users/Groups | Roles, Policies, IAM Users | Security structures for limiting access to resources |
Slurm/SGE | AWS Batch | Platform for scheduling compute jobs |
Package Managers/Lmod | Containers | Used to create a particular software environment for scripts/pipelines |
Concept | Description |
---|---|
Regions | Logical Divisions of the AWS Network based on geographic location. Resources created in one region are typically not visible to those in other regions. |
Batch Environments in aws-setup
The core concept in aws-setup is that of the "batch environment". A batch environment is the interface used to spin up a cluster and run a particular computation or set of computations. Each environment created with aws-setup has three core properties:
- <Region>
- <Container Image>
- <Attached S3 Buckets & Allowed operations>
In other words:
- which geographical location should we spin up the cluster in
- what software environment should this cluster use
- what data storage is available to this cluster, and how are we allowed to access that storage?
Example: DOCK Batch Environment
For example, the core properties of a batch environment for docking might look like this:
- us-east-1
- bkslab/awsdock:latest
- zinc3d:input, my_stuff:input,output
In other words:
- This environment spins up a cluster in the us-east-1 region
- We are using the bkslab/awsdock:latest docker image as our software environment
- This cluster is allowed to read from the zinc3d data bucket, and is allowed to read and write to the my_stuff data bucket.