Cluster 2: Difference between revisions

From DISI
Jump to navigation Jump to search
No edit summary
Line 3: Line 3:
{{TOCright}}
{{TOCright}}


= Hardware =  
= Priorities and Policies =  
* aleph: aka hypervisor, run core services
* [[Lab Security Policy]]
* happy.  internal  SATA 10 x 2TB raw  = 9 TB RAID10. external SAS HP 12 x 4 TB raw = 36 TB RAID6.
* [[Disk space policy]]
* nas1: SiM 24 x 4 TB SAS NAS = 72TB RAID6
* [[Cluster 2 account]]
* 4 x 32 core SiM machines.
* machines to be added from cluster 0


= Equipment, names, roles =
* The Hebrew alphabet is used for physical machines, Greek for VMs. Functions (e.g. sgehead) are aliases (CNAMEs).
* Cluster 2 is currently stored entirely in Rack 0 which is in Row 0, Position 4 of BH101 at 1700 4th St (Byers Hall). '''More racks will be added by July.'''
* Core services are on aleph, an HP DL160G5. Using a libvirt hypervisor, aleph runs all core services.
* There are 3 Silicon Mechanics Rackform nServ A4412.v4 s, each comprising 4 computers of 32 cpu-cores for a total of 384 cpu-cores.
* An HP DL165G7 (24-way) is sgehead
* HP disks - new in 2014 - 40 TB RAID6 SAS
* Silicon Mechanics NAS - new in 2014 - 76 TB RAID6 SAS
* an HP DL160G5 and an MSA60 with 12 TB SAS - new in 2014.
* A Dell C6145 with 128 cores.
* Current total of 512 cores for queued jobs and 128 cores for infrastructure, databases, management and ad hoc jobs.
= Disk organization =  
= Disk organization =  
* shin aka nas1 mounted as /nfs/db/ =  72 TB SAS RAID6
* shin aka nas1 mounted as /nfs/db/ =  72 TB SAS RAID6
* bet aka happy, internal: /nfs/store and psql (temp) as 10 TB SATA RAID10
* bet aka happy, internal: /nfs/store and psql (temp) as 10 TB SATA RAID10
* elated: /nfs/work only as 36 TB SAS RAID6
* elated on happy: /nfs/work only as 36 TB SAS RAID6
* het (43) aka  former vmware2 MSA 60  exports /nfs/home and /nfs/soft
* het (43) aka  former vmware2 MSA 60  exports /nfs/home and /nfs/soft


= Getting started =  
= Getting started =  
Welcome to the lab. Here is what you need to know to get started.
* 1. Your account. Get it from your system administrator Therese Demers (or John Irwin).
* 2. Your home is on /nfs/home/<your_id>/. This area is backed up and is for important persistent files.
* 3. You should run docking jobs and other intense calculations in /nfs/work/<your_id>/.
* 4. You should keep static data (e.g. crystallography data, results of published papers) in /nfs/store/<your_id>/.
* 5. Lab guests get 100GB in each of these areas, and lab members get 500GB. You may request more, just ask!
* 6. If you go over your limit, you get emails for 2 weeks, then we impose a hard limit if you have not solved your overage.
* 7. You can choose bash or tcsh to be your default shell. We don't care. Everything should work equally well with both.
* 8. There is a special kind of static data, databases, for which you may request space. They will go in /nfs/db/<db_name>/. e.g. /nfs/db/zinc/ and /nfs/db/dude/ and /nfs/db/pdb and so on.
* 9. Please run large docking jobs on /nfs/work and not on /nfs/store or /nfs/home. When you publish a paper, please delete what you can, compress the rest, and move it to /store/. Do not leave it on /work/ if you are no longer using it actively.
* 10. Set up your account so that you can log in all across the cluster without a password. ssh-keygen; cd .ssh; cp id_rsa.pub authorized_keys; chmod 600 authorized_keys;
* 11. Software lives in /nfs/software/. All our machines are 64 bit Centos 6.3 unless otherwise indicated.
* 12. Python 2.7 and 3.0 are installed. We currently recommend 2.7 because of library availability, but that may change soon. (Aug 2012)
* 13. If you use tcsh, copy .login and .cshrc from ~jji/  ; If you use bash, copy .bash_profile from ~jji/





Revision as of 14:03, 23 April 2014

Our new cluster at UCSF is described on this page. The physical equipment in cluster Cluster 0 will be subsumed into this cluster when it replicates all the functions of the original. We expect this to happen later in 2014.

Priorities and Policies

Equipment, names, roles

  • The Hebrew alphabet is used for physical machines, Greek for VMs. Functions (e.g. sgehead) are aliases (CNAMEs).
  • Cluster 2 is currently stored entirely in Rack 0 which is in Row 0, Position 4 of BH101 at 1700 4th St (Byers Hall). More racks will be added by July.
  • Core services are on aleph, an HP DL160G5. Using a libvirt hypervisor, aleph runs all core services.
  • There are 3 Silicon Mechanics Rackform nServ A4412.v4 s, each comprising 4 computers of 32 cpu-cores for a total of 384 cpu-cores.
  • An HP DL165G7 (24-way) is sgehead
  • HP disks - new in 2014 - 40 TB RAID6 SAS
  • Silicon Mechanics NAS - new in 2014 - 76 TB RAID6 SAS
  • an HP DL160G5 and an MSA60 with 12 TB SAS - new in 2014.
  • A Dell C6145 with 128 cores.
  • Current total of 512 cores for queued jobs and 128 cores for infrastructure, databases, management and ad hoc jobs.

Disk organization

  • shin aka nas1 mounted as /nfs/db/ = 72 TB SAS RAID6
  • bet aka happy, internal: /nfs/store and psql (temp) as 10 TB SATA RAID10
  • elated on happy: /nfs/work only as 36 TB SAS RAID6
  • het (43) aka former vmware2 MSA 60 exports /nfs/home and /nfs/soft

Getting started

= Roles = 

General

  • sgehead - access to the cluster from within the lab
    • pgf fortran compiler
    • submit jobs to queue
  • portal - access to the cluster from off campus
  • ppilot - our pipeline pilot license will be transferred here
  • www - static webserver VM
  • dock - dock licensing VM
  • drupal -
  • wordpress -
  • public - runs public services ZINC, DOCK Blaster, SEA, DUDE
  • happy - postgres production server
  • ark - intern psql, like raiders in yyz
  • nfs1 - disk server 1
  • nfs2 - disk server 2
  • nfs3 - disk server 3
  • fprint - fingerprinting server

Services

  • aleph - VM running core administrative functions
  • bet -
  • gimel -
  • dalet -
  • he -
  • vav -
  • zayin -


SEA server

  • fawlty
  • mysql server is on msqlserver aka inception
  • fingerprint server is on fingerprint aka darkcrystal


By rack

Rack 0 - 10.20.0.*

Location BH101, column 7 row 5

  • aleph
  • bet
  • happy

Rack 1 - 10.20.10.*

Location: BH101, column 1 row 0

Rack 2 - 10.20.30.*

Location: BH


= how to administer DHCP / DNS in BH101

https://www.cgl.ucsf.edu/dns_dhcp/


About our cluster