Cluster Narrative: Difference between revisions
mNo edit summary |
m (→Software choices: asdf) |
||
(15 intermediate revisions by the same user not shown) | |||
Line 2: | Line 2: | ||
= Hardware layout and physical choices = | = Hardware layout and physical choices = | ||
You can just install [[DOCK]] on a computer and use it -ignoring a lot of what is written here - there is nothing to stop you. However, no matter how small you start, most labs will want to add | You can just install [[DOCK]] on a computer and use it -ignoring a lot of what is written here - there is nothing to stop you. However, no matter how small you start, most labs will want to add new machines to the cluster as soon as funds become available. Adding a few nodes to the cluster can be done manually, but pretty soon things get more complicated. Each machine is subtly different owing to when it was installed, and the amount of work to maintain the cluster rises nearly linearly with the number of computers. Maintenance can quickly become a problem. (The cluster uniformity problem) | ||
Here we propose a cluster architecture that allows maintenance to scale far less than linearly with machines. | Here we propose a cluster architecture that allows maintenance to scale far less than linearly with the number of machines. To do this, we create central services to support the cluster. A somewhat higher startup cost commissioning the cluster nucleus is traded for far lower marginal costs of adding and maintaining machines, and higher cluster uniformity. The cognitive overhead of the sysadmin role is also reduced, almost to a manageable level, or so they say. | ||
The central services are: | The central services are: | ||
{| class="wikitable" | |||
|- | |||
! Class !! We use !! Explanation | |||
|- | |||
| hypervisor || [[libvirt]] || allows you to run many machines as virtual machines on a single piece of hardware. vmware and VirtualBox are also known to us and are very good. | |||
|- | |||
| provisioning server || foreman || Foreman seems to be the industry leader at the moment, but again there are many of these. | |||
|- | |||
| DNS || BIND || Unix default. | |||
|- | |||
| authentication server || 389 || authentication gets complex fast - it may make sense to use what your colleagues are using. | |||
|- | |||
| queuing system || [[openSGE]] || Several versions available; use the one you know best, if you like it. | |||
|- | |||
| portal / firewall || custom || more here | |||
|- | |||
| NFS server(s) || NAS/SAN || more here | |||
|} | |||
== Merits of buying vs repurposing computers == | |||
If you have three to six computers already available, you may use them instead of a hypervisor. If you are going to spend new money, we strongly urge you to consider getting a single big machine and running a hypervisor. It uses less space, less energy, generates less heat requiring less cooling, and will be easier to maintain. | |||
= Software choices = | == Description of core services == | ||
Foreman allows you to install and format a new machine automatically. DNS allows you to run a private network, which we strongly recommend, and is really essential if you use foreman. 389/authentication is our preferred solution to managing passwords centrally. A portal/firewall is optional, and your setup will depend on your institutional environment. Frankly, nothing is perfect. Think of security as layers. More layers can provide more protection, and can contain the damage if you screw up. NFS servers are a good solution up to at least 1000 cores, which we think covers most of our users. Use only gigabit ethernet, with optional trunking for more throughput. | |||
== Order of installation == | |||
We recommend you set up the hypervisor first (or your 3-6 core machines). Then, create VM machines for foreman, DNS, 389/authentication, sgemaster and sgehead under the hypervisor. We recommend you use a separate physical machine for the portal. We recommend putting all machines that do not need to be on the public internet on the private network only. If you choose to not use a hypervisor (which is fine) we recommend using a separate physical machine for each core service. | |||
== Software choices == | |||
Whereas we have tried to reduce dependencies on third party software, some critical dependencies remain and will probably continue to exist for the foreseeable future. Before you start, you need to know that you are going to need this software, which comes with its own licensing terms. | Whereas we have tried to reduce dependencies on third party software, some critical dependencies remain and will probably continue to exist for the foreseeable future. Before you start, you need to know that you are going to need this software, which comes with its own licensing terms. | ||
Library preparation: | Library preparation: | ||
Docking scripts: OEChem | * OpenEye OEChem | ||
* Omega or cognate | |||
* AMSOL | |||
Docking scripts: | |||
* OEChem | |||
'''in progress''' | |||
Back to [[So you want to set up a lab]] | Back to [[So you want to set up a lab]] | ||
[[Category: | [[Category:Sysadmin]] | ||
[[Category:Narratives]] |
Latest revision as of 01:29, 24 May 2024
Building a robust cluster for computational pharmacology and computer aided drug discovery is a big deal. This page is part of a series of articles called So you want to set up a lab. Here we describe the overall process, the tradeoffs, and the big picture of what you are doing. We hope you find it useful.
Hardware layout and physical choices
You can just install DOCK on a computer and use it -ignoring a lot of what is written here - there is nothing to stop you. However, no matter how small you start, most labs will want to add new machines to the cluster as soon as funds become available. Adding a few nodes to the cluster can be done manually, but pretty soon things get more complicated. Each machine is subtly different owing to when it was installed, and the amount of work to maintain the cluster rises nearly linearly with the number of computers. Maintenance can quickly become a problem. (The cluster uniformity problem)
Here we propose a cluster architecture that allows maintenance to scale far less than linearly with the number of machines. To do this, we create central services to support the cluster. A somewhat higher startup cost commissioning the cluster nucleus is traded for far lower marginal costs of adding and maintaining machines, and higher cluster uniformity. The cognitive overhead of the sysadmin role is also reduced, almost to a manageable level, or so they say.
The central services are:
Class | We use | Explanation |
---|---|---|
hypervisor | libvirt | allows you to run many machines as virtual machines on a single piece of hardware. vmware and VirtualBox are also known to us and are very good. |
provisioning server | foreman | Foreman seems to be the industry leader at the moment, but again there are many of these. |
DNS | BIND | Unix default. |
authentication server | 389 | authentication gets complex fast - it may make sense to use what your colleagues are using. |
queuing system | openSGE | Several versions available; use the one you know best, if you like it. |
portal / firewall | custom | more here |
NFS server(s) | NAS/SAN | more here |
Merits of buying vs repurposing computers
If you have three to six computers already available, you may use them instead of a hypervisor. If you are going to spend new money, we strongly urge you to consider getting a single big machine and running a hypervisor. It uses less space, less energy, generates less heat requiring less cooling, and will be easier to maintain.
Description of core services
Foreman allows you to install and format a new machine automatically. DNS allows you to run a private network, which we strongly recommend, and is really essential if you use foreman. 389/authentication is our preferred solution to managing passwords centrally. A portal/firewall is optional, and your setup will depend on your institutional environment. Frankly, nothing is perfect. Think of security as layers. More layers can provide more protection, and can contain the damage if you screw up. NFS servers are a good solution up to at least 1000 cores, which we think covers most of our users. Use only gigabit ethernet, with optional trunking for more throughput.
Order of installation
We recommend you set up the hypervisor first (or your 3-6 core machines). Then, create VM machines for foreman, DNS, 389/authentication, sgemaster and sgehead under the hypervisor. We recommend you use a separate physical machine for the portal. We recommend putting all machines that do not need to be on the public internet on the private network only. If you choose to not use a hypervisor (which is fine) we recommend using a separate physical machine for each core service.
Software choices
Whereas we have tried to reduce dependencies on third party software, some critical dependencies remain and will probably continue to exist for the foreseeable future. Before you start, you need to know that you are going to need this software, which comes with its own licensing terms.
Library preparation:
- OpenEye OEChem
- Omega or cognate
- AMSOL
Docking scripts:
- OEChem
in progress
Back to So you want to set up a lab