Cluster 2: Difference between revisions

From DISI
Jump to navigation Jump to search
No edit summary
No edit summary
Line 1: Line 1:
Our new cluster at [[UCSF]] is described on this page.  The physical equipment in cluster [[Cluster 0]] will be subsumed into this cluster when it replicates all the functions of the original. We expect this to happen later in 2014.
Our new cluster at [[UCSF]] is described on this page.  The physical equipment in [[Cluster 0]] will be subsumed into this cluster when Cluster 2 replicates all the functions of Cluster 0. We expect this to happen later in 2014.
   
   
{{TOCright}}
{{TOCright}}
Line 9: Line 9:


= Equipment, names, roles =
= Equipment, names, roles =
* '''512 cores for queued jobs and 128 cores for infrastructure, databases, management and ad hoc jobs'''
* The Hebrew alphabet is used for physical machines, Greek for VMs. Functions (e.g. sgehead) are aliases (CNAMEs).  
* The Hebrew alphabet is used for physical machines, Greek for VMs. Functions (e.g. sgehead) are aliases (CNAMEs).  
* Cluster 2 is currently stored entirely in Rack 0 which is in Row 0, Position 4 of BH101 at 1700 4th St (Byers Hall). '''More racks will be added by July.'''
* Cluster 2 is currently stored entirely in Rack 0 which is in Row 0, Position 4 of BH101 at 1700 4th St (Byers Hall). '''More racks will be added by July.'''
* Core services are on aleph, an HP DL160G5. Using a libvirt hypervisor, aleph runs all core services.  
* Core services are on aleph, an HP DL160G5. Using a libvirt hypervisor, aleph runs all core services.  
* There are 3 Silicon Mechanics Rackform nServ A4412.v4 s, each comprising 4 computers of 32 cpu-cores for a total of 384 cpu-cores.
* CPU
* An HP DL165G7 (24-way) is sgehead
** 3 Silicon Mechanics Rackform nServ A4412.v4 s, each comprising 4 computers of 32 cpu-cores for a total of 384 cpu-cores.
* HP disks - new in 2014 - 40 TB RAID6 SAS
** 1 Dell C6145 with 128 cores.
* Silicon Mechanics NAS - new in 2014 - 76 TB RAID6 SAS
** An HP DL165G7 (24-way) is sgehead
* an HP DL160G5 and an MSA60 with 12 TB SAS - new in 2014.
* DISK
* A Dell C6145 with 128 cores.
** HP disks - new in 2014 - 40 TB RAID6 SAS
* Current total of 512 cores for queued jobs and 128 cores for infrastructure, databases, management and ad hoc jobs.
** Silicon Mechanics NAS - new in 2014 - 76 TB RAID6 SAS
** A HP DL160G5 and an MSA60 with 12 TB SAS - new in 2014.
 
= Disk organization =  
= Disk organization =  
* shin aka nas1 mounted as /nfs/db/ =  72 TB SAS RAID6
* shin aka nas1 mounted as /nfs/db/ =  72 TB SAS RAID6

Revision as of 14:29, 23 April 2014

Our new cluster at UCSF is described on this page. The physical equipment in Cluster 0 will be subsumed into this cluster when Cluster 2 replicates all the functions of Cluster 0. We expect this to happen later in 2014.

Priorities and Policies

Equipment, names, roles

  • 512 cores for queued jobs and 128 cores for infrastructure, databases, management and ad hoc jobs
  • The Hebrew alphabet is used for physical machines, Greek for VMs. Functions (e.g. sgehead) are aliases (CNAMEs).
  • Cluster 2 is currently stored entirely in Rack 0 which is in Row 0, Position 4 of BH101 at 1700 4th St (Byers Hall). More racks will be added by July.
  • Core services are on aleph, an HP DL160G5. Using a libvirt hypervisor, aleph runs all core services.
  • CPU
    • 3 Silicon Mechanics Rackform nServ A4412.v4 s, each comprising 4 computers of 32 cpu-cores for a total of 384 cpu-cores.
    • 1 Dell C6145 with 128 cores.
    • An HP DL165G7 (24-way) is sgehead
  • DISK
    • HP disks - new in 2014 - 40 TB RAID6 SAS
    • Silicon Mechanics NAS - new in 2014 - 76 TB RAID6 SAS
    • A HP DL160G5 and an MSA60 with 12 TB SAS - new in 2014.

Disk organization

  • shin aka nas1 mounted as /nfs/db/ = 72 TB SAS RAID6
  • bet aka happy, internal: /nfs/store and psql (temp) as 10 TB SATA RAID10
  • elated on happy: /nfs/work only as 36 TB SAS RAID6
  • het (43) aka former vmware2 MSA 60 exports /nfs/home and /nfs/soft

Special purpose machines - all .ucsf.bkslab.org

  • sgehead- we recommend you use this - in addition to your desktop - for most purposes, including launching jobs on the cluster.
  • pgf - fortran compiler
  • portal - secure access from
  • ppilot - pipeline pilot
  • shin, bet, and dalet are the three NFS servers. You should not need to log in.
  • mysql1 - general purpose mysql server (like former scratch)
  • pg1 - general purpose postgres server
  • fprint - fingerprinting server

About our cluster