Cluster Narrative: Difference between revisions
(Created page with "Building a robust cluster for computational pharmacology and computer aided drug discovery is a big deal. This page is part of a series of articles called [[So you want to set...") |
No edit summary |
||
Line 1: | Line 1: | ||
Building a robust cluster for computational pharmacology and computer aided drug discovery is a big deal. This page is part of a series of articles called [[So you want to setup a lab]]. Here we describe the overall process, the tradeoffs, and the big picture of what you are doing. We hope you find it useful. | Building a robust cluster for computational pharmacology and computer aided drug discovery is a big deal. This page is part of a series of articles called [[So you want to setup a lab]]. Here we describe the overall process, the tradeoffs, and the big picture of what you are doing. We hope you find it useful. | ||
= Hardware layout and physical choices = | |||
You can just install [[DOCK]] on a computer and use it -ignoring a lot of what is written here - there is nothing to stop you. However, however small you start, most labs will want to add additional machines to the cluster as soon as funds become available. Adding a few nodes to the cluster can be done manually, but pretty soon it gets out of hand. Each machine is subtly different owing to when it was installed, and the amount of work to maintain the cluster rises nearly linearly with the number of computers. Maintenance can quickly become a problem. | You can just install [[DOCK]] on a computer and use it -ignoring a lot of what is written here - there is nothing to stop you. However, however small you start, most labs will want to add additional machines to the cluster as soon as funds become available. Adding a few nodes to the cluster can be done manually, but pretty soon it gets out of hand. Each machine is subtly different owing to when it was installed, and the amount of work to maintain the cluster rises nearly linearly with the number of computers. Maintenance can quickly become a problem. | ||
Line 18: | Line 19: | ||
We recommend you set up the hypervisor first. Then create VM machines for foreman, DNS, 389/authentication, sgemaster and sgehead under the hypervisor. We recommend you use a separate physical machine for the portal, which is going to give you some protection from remote access users. We recommend putting all machines that do not need to be on the public internet on the private network only. If you choose to not use a hypervisor (which is fine) we recommend using a separate physical machine for each core service. | We recommend you set up the hypervisor first. Then create VM machines for foreman, DNS, 389/authentication, sgemaster and sgehead under the hypervisor. We recommend you use a separate physical machine for the portal, which is going to give you some protection from remote access users. We recommend putting all machines that do not need to be on the public internet on the private network only. If you choose to not use a hypervisor (which is fine) we recommend using a separate physical machine for each core service. | ||
= Software choices = | |||
Whereas we have tried to reduce dependencies on third party software, some critical dependencies remain and will probably continue to exist for the foreseeable future. Before you start, you need to know that you are going to need this software, which comes with its own licensing terms. | |||
Library preparation: OpenEye OEChem and Omega, AMSOL. | |||
Docking scripts: OEChem | |||
Revision as of 16:54, 18 March 2014
Building a robust cluster for computational pharmacology and computer aided drug discovery is a big deal. This page is part of a series of articles called So you want to setup a lab. Here we describe the overall process, the tradeoffs, and the big picture of what you are doing. We hope you find it useful.
Hardware layout and physical choices
You can just install DOCK on a computer and use it -ignoring a lot of what is written here - there is nothing to stop you. However, however small you start, most labs will want to add additional machines to the cluster as soon as funds become available. Adding a few nodes to the cluster can be done manually, but pretty soon it gets out of hand. Each machine is subtly different owing to when it was installed, and the amount of work to maintain the cluster rises nearly linearly with the number of computers. Maintenance can quickly become a problem.
Here we propose a cluster architecture that allows maintenance to scale far less than linearly with machines. In order to do this, we create central services to support the cluster. So there is a high startup cost to commissioning the cluster nucleus, and a far lower marginal cost of adding new machines.
The central services are:
- hypervisor - allows you to run many machines as virtual machines on a single piece of hardware.
- foreman - computer provisioning management
- DNS
- 389 / authentication
- sgemaster
- sgehead
- portal / firewall
- NFS server(s)
If you have six computers available at low or no cost, you can omit using a hypervisor. Foreman is recommended in all cases. It allows you to install and format a new machine automatically. DNS allows you to run a private network, which we strongly recommend. 389/authentication is our preferred solution to managing passwords centrally. A portal/firewall is optional, but will give you peace of mind and protect you from attack. NFS servers are a good solution up to at least 1000 cores, which we think covers most of our users.
We recommend you set up the hypervisor first. Then create VM machines for foreman, DNS, 389/authentication, sgemaster and sgehead under the hypervisor. We recommend you use a separate physical machine for the portal, which is going to give you some protection from remote access users. We recommend putting all machines that do not need to be on the public internet on the private network only. If you choose to not use a hypervisor (which is fine) we recommend using a separate physical machine for each core service.
Software choices
Whereas we have tried to reduce dependencies on third party software, some critical dependencies remain and will probably continue to exist for the foreseeable future. Before you start, you need to know that you are going to need this software, which comes with its own licensing terms.
Library preparation: OpenEye OEChem and Omega, AMSOL. Docking scripts: OEChem
Back to So you want to setup a lab