Cluster 2: Difference between revisions
Jump to navigation
Jump to search
Line 15: | Line 15: | ||
* Sigma can definitely go off and stay off. It was planned for a fingerprinting server, never done. | * Sigma can definitely go off and stay off. It was planned for a fingerprinting server, never done. | ||
* kappa is licensing. ask me. ("i have no clue what this licenses." - ben) | * kappa is licensing. ask me. ("i have no clue what this licenses." - ben) | ||
* | * rho contains this wiki and also bkslab.org | ||
* Psi is fortran and stays on | * Psi is fortran and stays on | ||
* Tau is the web server, and will move to he | * Tau is the web server, and will move to he |
Revision as of 22:35, 20 July 2018
This is the default lab cluster.
Priorities and Policies
- Lab Security Policy
- Disk space policy
- Backups policy.
- Portal system for off-site ssh cluster access.
- Get a Cluster 2 account and get started
Special machines
Normally, you will just ssh to sgehead aka gimel from portal.ucsf.bkslab.org where you can do almost anything, including job management. A few things require licensing and must be done on special machines.
- Sigma can definitely go off and stay off. It was planned for a fingerprinting server, never done.
- kappa is licensing. ask me. ("i have no clue what this licenses." - ben)
- rho contains this wiki and also bkslab.org
- Psi is fortran and stays on
- Tau is the web server, and will move to he
- he also hosts:
- alpha - which is critical and runs foreman, DNS, and other important services
- beta - with runs LDAP and is important
- gamma -=
- psi for using the PG fortran compiler
- ppilot is at http://zeta:9944/ - you must be on the Cluster 2 private network to use it
- no other special machines
Notes
- to get from SVN, use svn ssh+svn
Hardware and physical location
- 1856 cpu-cores for queued jobs
- 128 cpu-cores for infrastructure, databases, management and ad hoc jobs.
- 788 TB of high quality NFS-available disk
- Our policy is to have 4 GB RAM per cpu-core unless otherwise specified.
- Machines older than 3 years may have 2GB/core and 6 years old have 1GB/core.
- Cluster 2 is currently stored entirely in Rack 0 which is in Row 0, Position 4 of BH101 at 1700 4th St (Byers Hall).
- Central services are on he,aleph2,and bet
- CPU
- 3 Silicon Mechanics Rackform nServ A4412.v4 s, each comprising 4 computers of 32 cpu-cores for a total of 384 cpu-cores.
- 1 Dell C6145 with 128 cores.
- An HP DL165G7 (24-way) is sgehead
- more computers to come from Cluster 0, when Cluster 2 is fully ready.
- DISK
- HP disks - 40 TB RAID6 SAS (new in 2014)
- Silicon Mechanics NAS - new in 2014 - 77 TB RAID6 SAS (new in 2014)
- A HP DL160G5 and an MSA60 with 12 TB SAS (disks new in 2014)
= Naming convention
- The Hebrew alphabet is used for physical machines
- Greek letters for VMs.
- Functions (e.g. sgehead) are aliases (CNAMEs).
- compbio.ucsf.edu and ucsf.bkslab.org domains both supported.
Disk organization
- shin aka nas1 mounted as /nfs/db/ = 72 TB SAS RAID6
- bet aka happy, internal: /nfs/store and psql (temp) as 10 TB SATA RAID10
- elated on happy: /nfs/work only as 36 TB SAS RAID6
- dalet exports /nfs/home & /nfs/home2
Special purpose machines - all .ucsf.bkslab.org
- sgehead aka gimel.cluster - nearly the only machine you'll need.
- psi.cluster - PG fortran compiler (if it only has a .cluster address means it has no public address)
- portal aka epsilon - secure access
- zeta.cluster - Pipeline Pilot
- shin, bet, and dalet are the three NFS servers. You should not need to log in to them.
on teague desktop, /usr/local/RAID Web Console 2/startupui.sh connect to shin on public network raid / C2 on shin
- mysql1.cluster - general purpose mysql server (like former scratch)
- pg1.cluster - general purpose postgres server
- fprint.cluster - fingerprinting server