So you want to set up a lab: Difference between revisions

From DISI
Jump to navigation Jump to search
No edit summary
No edit summary
 
(34 intermediate revisions by one other user not shown)
Line 4: Line 4:


= Requirements and assumptions =
= Requirements and assumptions =
This page describes setting up a full computational pharmacology research lab. We describe the minimum setup, plus options for expansion. The entry level assumes you can spend: $20K on servers and $2K on a workstation. You'll need an acoustically insulated air conditioned room. It would help to have someone with system administration experience for 100 hours (3 weeks) for the initial set up, and then 1 day/month for maintenance.
This page describes setting up a full computational pharmacology research lab. We describe the minimum setup, plus options for expansion. The entry level assumes you can spend $20K on servers and $2K on a workstation, or you already own equivalent or better hardware. You'll need an acoustically insulated air conditioned room. Expect to spend a month to specify, order and deploy hardware, a week to install operating system level, a week to install middleware, and a week to install our software. Times will be longer if it is your first time.  Once running, expect 1 day/month/cluster plus 1 day/month/rack of sysadmin time to maintain your system.


When you have all the software and hardware, you can get a docking lab up and running at a basic level in less than a day. If you do not have all the software and hardware, it will take longer. If you've never run a molecular docking lab, allow a week to learn the basics.  Once you have a basic installation running, plan for a few days to [[get a queuing system working]] the way you like it.  We may be able to help. Ask us, but please read this wiki first.
= Narrative =
Before starting, let's talk about what you need for a successful computational pharmacology cluster. [[Cluster Narrative]].


= Physical level =
= Physical level =
Here is our recommendation of how to build a computer cluster that will work well for molecular docking and cheminformatics.  This document can be used whether you already own hardware, or whether you are planning to buy new. All recommendations are the best we know as of Feb 2014.  Things do change, but this advice should be ok through 2015.
We discuss options for acquiring various sizes of hardware in [[Cluster Theory]]More thoughts: [[Acquire and deploy hardware]]. Allow a month to obtain and deploy hardware.
 
== About buying CPU and disk ==
We are currently buying CPU from Silicon Mechanics and Dell and disk from Silicon Mechanics and HP.
We recommend buying two different kinds of machines in a modular fashion:  head nodes to which disk enclosures may be attached and cpu nodes which contain large numbers of coresWe are currently buying enclosures holding 12 SAS disks of 4TB each for 48 TB raw for around $8000 or about 6 raw GB per dollar. Formatted RAID6 this works out to 36TB or 4.5 formatted GB per dollar. We like the HP P822 high performance RAID controller.  Compare this to what we were paying just a year ago in spring 2013: 25 TB for $10,000 or 2.5 TB per dollar unformatted. An amazing development in the last 12 months.
 
For CPU, we like the C6145 from Dell.  For around $20,000 you get 2 machines in a 2U form each with 64 cores and 256 GB memory and  a pair of RAID1 formatted disks each. A single 42U rack could hold 2560 cores and still have room for a switch.  Of course, this would cost you $400,000, pull 28 kW and need 5 T of cooling. Amazing density at commodity prices.
 
== About setting up a network ==
$250 for a Managed 24 port GigE switch with VPN support. About $250. Will do for most circumstances. Allows you to run both private and public networks.  


= Operating System level =
= Operating System level =
To begin, you will either need 6 computers to host the central services, or you will need a hypervisor to host 6 VMs, or some mixture of the above. We recommend the hypervisor if you can bear it and the 6 physical computers if you can afford it (space, energy, money).
Here we use Operating system to describe all software components that are not particular to our field. Thus for the purposes of this discussion, python and a queuing system are considered part of the operating system level. [[Install operating system]]. Allow one week for this step.
 
== Hypervisor ==
We use (xxx), but any should do, including virtualbox, vmware, among many others.
 
== Foreman ==
Foreman is the node creation and provisioning server.
We recommend [[Centos]] 6.3.
Here is how to set one up: [[Foreman]]
 
== Authentication server ==
We use 389, but other authentication systems will work fine, including kerberos.
 
* DNS
 
== Fileservers and NFS ==
We use XFS over NFS.  We tend to hang several enclosures off a head node.  We recommend SAS, which has finally come down in price, and RAID6 formatting.  We tend to use enclosures that host 12 disks of 4TB each.
 
== Portal and Security ==
We recommend setting up a portal and blocking all inbound access to all other computers. Use two portals at distinct geographical locations for added robustness.
 
* Perimeter security
 
== Queuing system ==
We recommend free versions of Sun Grid Engine [[SGE]].
 
== Set up a database server ==
 
== Add a new node to the cluster ==
[[How to spin up a new virtual machine]]
 
== Add new disk to the cluster ==
[[Configure new disk]]
 
== Deploy a workstation ==
[[Workstation Install]]


= Middleware level =
= Middleware level =
Set up middleware for a [[Compbio middleware | computational drug discovery lab]].
We use middleware to mean 3rd party software required by our software that is specific to our field and that you would not normally expect to find installed on a new computer.
 
Set up middleware for a [[Compbio middleware | computational drug discovery lab]]. Allow one week for this step.
= Set up psql/rdkit from scratch =


= Our Software =
Each product is licensed and distributed separately.  Allow at least a day for each product, more if the previous steps above are incomplete.


= Server Application Software =
* [[Install DOCK 3.7]]
 
* [[Install SEA]]
[[Install DOCK 3.7]]
* [[Install DOCK 6]]
 
* [[Install ZINC]]
[[Install SEA]]
* [[Install DOCK Blaster]]
 
[[Install DOCK 6]]
 
[[Install ZINC]]
 
[[Install DOCK Blaster]]
 
== [[ZINC]] ==
Here is how to set up ZINC from scratch on a new cluster.
 
* create database
* install software and web interface
* test
 
 
== Using git and github ==
 
= Client Application Software =
 
[[PyMol]]
 
[[Chimera]]
 
[[Knime]]
 
[[Marvin]]
 
[[InstantJChem]]


= Operations =  
= Operations =  
Monthly maintenance tasks.  
Once the system is set up and running, it will need to be maintained. Please see the [[:Category:Sysadmin | system administrator's guide]]  
 
[[Backups]]
 
[[Security Review]]
 
[[Software upgrades]]
 
=== CCP4 ===
 
=== Chimera ===
 
== H. acquire and install docking.org software ==
 
=== dockenv ===
 
== I. configure docking.org software ==


[[Category:Sysadmin]]
[[Category:Sysadmin]]
[[Category:Tutorials]]
[[Category:Tutorials]]
[[Category:Setup]]

Latest revision as of 19:38, 10 January 2019

Here is how I would set up a computational lab, one that could join the Worldwide ZINC network.

Requirements and assumptions

This page describes setting up a full computational pharmacology research lab. We describe the minimum setup, plus options for expansion. The entry level assumes you can spend $20K on servers and $2K on a workstation, or you already own equivalent or better hardware. You'll need an acoustically insulated air conditioned room. Expect to spend a month to specify, order and deploy hardware, a week to install operating system level, a week to install middleware, and a week to install our software. Times will be longer if it is your first time. Once running, expect 1 day/month/cluster plus 1 day/month/rack of sysadmin time to maintain your system.

Narrative

Before starting, let's talk about what you need for a successful computational pharmacology cluster. Cluster Narrative.

Physical level

We discuss options for acquiring various sizes of hardware in Cluster Theory. More thoughts: Acquire and deploy hardware. Allow a month to obtain and deploy hardware.

Operating System level

Here we use Operating system to describe all software components that are not particular to our field. Thus for the purposes of this discussion, python and a queuing system are considered part of the operating system level. Install operating system. Allow one week for this step.

Middleware level

We use middleware to mean 3rd party software required by our software that is specific to our field and that you would not normally expect to find installed on a new computer. Set up middleware for a computational drug discovery lab. Allow one week for this step.

Our Software

Each product is licensed and distributed separately. Allow at least a day for each product, more if the previous steps above are incomplete.

Operations

Once the system is set up and running, it will need to be maintained. Please see the system administrator's guide