Revision as of 19:22, 13 June 2024

Introduction

Cluster 2 is the most modern cluster the Irwin Lab maintains. It is reaching End of Life but that is TBD.

(Edited in May 06,2024)

Priorities and Policies

Lab Security Policy
Disk space policy
Backups policy.
Portal system for off-site ssh cluster access.
Get a Cluster 2 account and get started

Special machines

Normally, you will just ssh to sgehead aka gimel from portal.ucsf.bkslab.org where you can do almost anything, including job management. A few things require licensing and must be done on special machines.

hypervisor 'he' hosts:

alpha - which is critical and runs foreman, DNS, DHCP, and other important services
beta - with runs LDAP authentication
epsilon - portal.ucsf.bkslab.org - cluster gateway from public internet
gamma - sun grid engine qmaster
phi - mysqld/excipients
psi for using the PG fortran compiler
ppilot is at http://zeta:9944/ - you must be on the Cluster 2 private network to use it
Tau is the web server for ZINC,
no other specia
zeta - Psicquic/pipeline pilot
Sigma can definitely go off and stay off. It was planned for a fingerprinting server, never done.

hypervisor 'aleph2' hosts:

alpha7 - This is to be the future architecture VM of the cluster (DNS/DHCP/Puppet/Foreman/Ansible). CentOS7.
kappa is licensing. ask me. ("i have no clue what this licenses. Turned off." - ben)
rho contains this wiki and also bkslab.org

Notes

to get from SVN, use svn ssh+svn

Hardware and physical location

1856 cpu-cores for queued jobs
128 cpu-cores for infrastructure, databases, management and ad hoc jobs.
788 TB of high quality NFS-available disk
Our policy is to have 4 GB RAM per cpu-core unless otherwise specified.
Machines older than 3 years may have 2GB/core and 6 years old have 1GB/core.
Cluster 2 is currently stored entirely in Rack 0 which is in Row 0, Position 4 of BH101 at 1700 4th St (Byers Hall).
Central services are on he,aleph2,and bet
CPU
- 3 Silicon Mechanics Rackform nServ A4412.v4 s, each comprising 4 computers of 32 cpu-cores for a total of 384 cpu-cores.
- 1 Dell C6145 with 128 cores.
- An HP DL165G7 (24-way) is sgehead
- more computers to come from Cluster 0, when Cluster 2 is fully ready.
DISK
- HP disks - 40 TB RAID6 SAS (new in 2014)
- Silicon Mechanics NAS - new in 2014 - 77 TB RAID6 SAS (new in 2014)
- A HP DL160G5 and an MSA60 with 12 TB SAS (disks new in 2014)

= Naming convention

The Hebrew alphabet is used for physical machines
Greek letters for VMs.
Functions (e.g. sgehead) are aliases (CNAMEs).
compbio.ucsf.edu and ucsf.bkslab.org domains both supported.

Disk organization

shin aka nas1 mounted as /nfs/db/ = 72 TB SAS RAID6. NOTE: ON BAND: $ sudo /usr/local/RAID\ Web\ Console\ 2/startupui.sh to interact with raid controller. username: raid. pw: c2 pass
bet aka happy, internal: /nfs/store and psql (temp) as 10 TB SATA RAID10
elated on happy: /nfs/work only as 36 TB SAS RAID6
dalet exports /nfs/home & /nfs/home2

Special purpose machines - all .ucsf.bkslab.org

sgehead aka gimel.cluster - nearly the only machine you'll need.
psi.cluster - PG fortran compiler (if it only has a .cluster address means it has no public address)
portal aka epsilon - secure access
zeta.cluster - Pipeline Pilot
shin, bet, and dalet are the three NFS servers. You should not need to log in to them.

on teague desktop, /usr/local/RAID Web Console 2/startupui.sh 
connect to shin on public network
raid /  C2 on shin

mysql1.cluster - general purpose mysql server (like former scratch)
pg1.cluster - general purpose postgres server
fprint.cluster - fingerprinting server

Table of Server Information

SLURM

Server Name	Operating System	Functions
epyc	Rocky 8	Apache/HTTPD Webserver + Proxy
epyc2	Rocky 8	Hosts Vital VMs for cluster 2 for function.
epyc-A40	Rocky 8
n-1-101	Centos 7
n-1-105	Centos 7
n-1-124	Centos 7
n-1-126	Centos 7
n-1-141	Centos 7
n-1-16	Centos 7
n-1-17	Centos 7
n-1-18	Centos 7
n-1-19	Centos 7
n-1-20	Centos 7
n-1-21	Centos 7
n-1-28	Centos 7
n-1-38	Centos 7
n-5-13	Centos 7
n-5-14	Centos 7
n-5-15	Centos 7
n-5-32	Centos 7
n-5-33	Centos 7
n-5-34	Centos 7
n-5-35	Centos 7
n-9-19	Centos 7
n-9-20	Centos 7
n-9-21	Centos 7
n-9-22	Centos 7
n-9-34	Centos 7
n-9-36	Centos 7
n-9-38	Centos 7

SGE

Server Name	Operating System	Functions
gimel	Centos 6	In-person Login Node
Centos 6	Hosts Vital VMs for cluster 2 for function.
het	Centos 6
n-0-129	Centos 6
n-0-136	Centos 6
n-0-139	Centos 6
n-0-30	Centos 6
n-0-37	Centos 6
n-0-39	Centos 6
n-8-27	Centos 6
n-9-23	Centos 6

About our cluster

@@ Line 3: / Line 3: @@
 (Edited in May 06,2024)
+= Priorities and Policies =
+* [[Lab Security Policy]]
+* [[Disk space policy]]
+* [[Backups]] policy.
+* [[Portal system]] for off-site ssh cluster access.
+* Get a [[Cluster 2 account]] and get started
+= Special machines =
+Normally, you will just ssh to sgehead aka gimel from portal.ucsf.bkslab.org where you can do almost anything, including job management.  A few things require licensing and must be done on special machines.
+hypervisor 'he' hosts:
+* alpha  - which is critical and runs foreman, DNS, DHCP, and other important services
+* beta - with runs LDAP authentication
+* epsilon - portal.ucsf.bkslab.org - cluster gateway from public internet
+* gamma - sun grid engine qmaster
+* phi - mysqld/excipients
+* psi for using the PG fortran compiler
+* ppilot is at  http://zeta:9944/ - you must be on the Cluster 2 private network to use it
+* Tau is the web server for ZINC,
+* no other specia
+* zeta - Psicquic/pipeline pilot
+* Sigma can definitely go off and stay off. It was planned for a fingerprinting server, never done.
+hypervisor 'aleph2' hosts:
+* alpha7 - This is to be the future architecture VM of the cluster (DNS/DHCP/Puppet/Foreman/Ansible).  CentOS7.
+* kappa is licensing. ask me.  ("i have no clue what this licenses.  Turned off." - ben)
+* rho contains this wiki and also bkslab.org
+= Notes =
+* to get from SVN, use svn ssh+svn
+= Hardware and physical location =
+* 1856 cpu-cores for queued jobs
+* 128 cpu-cores for infrastructure, databases, management and ad hoc jobs.
+* 788 TB of high quality NFS-available disk
+* Our policy is to have 4 GB RAM per cpu-core unless otherwise specified.
+* Machines older than 3 years may have 2GB/core and 6 years old have 1GB/core.
+* Cluster 2 is currently stored entirely in Rack 0 which is in Row 0, Position 4 of BH101 at 1700 4th St (Byers Hall).
+* Central services are on he,aleph2,and bet
+* CPU
+** 3 Silicon Mechanics Rackform nServ A4412.v4 s, each comprising 4 computers of 32 cpu-cores for a total of 384 cpu-cores.
+** 1 Dell C6145 with 128 cores.
+** An HP DL165G7 (24-way) is sgehead
+** more computers to come from Cluster 0, when Cluster 2 is fully ready.
+* DISK
+** HP disks - 40 TB RAID6 SAS (new in 2014)
+** Silicon Mechanics NAS - new in 2014 - 77 TB RAID6 SAS (new in 2014)
+** A HP DL160G5 and an MSA60 with 12 TB SAS (disks new in 2014)
+= Naming convention
+* The Hebrew alphabet is used for physical machines
+* Greek letters for VMs.
+* Functions (e.g. sgehead) are aliases (CNAMEs).
+* compbio.ucsf.edu and ucsf.bkslab.org domains both supported.
+= Disk organization =
+* shin aka nas1 mounted as /nfs/db/ =  72 TB SAS RAID6.  NOTE: ON BAND:  $ sudo /usr/local/RAID\ Web\ Console\ 2/startupui.sh to interact with raid controller.  username: raid.  pw: c2 pass
+* bet aka happy, internal: /nfs/store and psql (temp) as 10 TB SATA RAID10
+* elated on happy: /nfs/work only as 36 TB SAS RAID6
+* dalet exports /nfs/home & /nfs/home2
+= Special purpose machines - all .ucsf.bkslab.org =
+* sgehead aka gimel.cluster - nearly the only machine you'll need.
+* psi.cluster - PG fortran compiler (if it only has a .cluster address means it has no public address)
+* portal aka epsilon - secure access
+* zeta.cluster - Pipeline Pilot
+* shin, bet, and dalet are the three NFS servers. You should not need to log in to them.
+ on teague desktop, /usr/local/RAID Web Console 2/startupui.sh
+ connect to shin on public network
+ raid /  C2 on shin
+* mysql1.cluster - general purpose mysql server (like former scratch)
+* pg1.cluster - general purpose postgres server
+* fprint.cluster - fingerprinting server
 = Table of Server Information =
@@ Line 106: / Line 184: @@
 |}
-= Priorities and Policies =
-* [[Lab Security Policy]]
-* [[Disk space policy]]
-* [[Backups]] policy.
-* [[Portal system]] for off-site ssh cluster access.
-* Get a [[Cluster 2 account]] and get started
-= Special machines =
-Normally, you will just ssh to sgehead aka gimel from portal.ucsf.bkslab.org where you can do almost anything, including job management.  A few things require licensing and must be done on special machines.
-hypervisor 'he' hosts:
-* alpha  - which is critical and runs foreman, DNS, DHCP, and other important services
-* beta - with runs LDAP authentication
-* epsilon - portal.ucsf.bkslab.org - cluster gateway from public internet
-* gamma - sun grid engine qmaster
-* phi - mysqld/excipients
-* psi for using the PG fortran compiler
-* ppilot is at  http://zeta:9944/ - you must be on the Cluster 2 private network to use it
-* Tau is the web server for ZINC,
-* no other specia
-* zeta - Psicquic/pipeline pilot
-* Sigma can definitely go off and stay off. It was planned for a fingerprinting server, never done.
-hypervisor 'aleph2' hosts:
-* alpha7 - This is to be the future architecture VM of the cluster (DNS/DHCP/Puppet/Foreman/Ansible).  CentOS7.
-* kappa is licensing. ask me.  ("i have no clue what this licenses.  Turned off." - ben)
-* rho contains this wiki and also bkslab.org
-= Notes =
-* to get from SVN, use svn ssh+svn
-= Hardware and physical location =
-* 1856 cpu-cores for queued jobs
-* 128 cpu-cores for infrastructure, databases, management and ad hoc jobs.
-* 788 TB of high quality NFS-available disk
-* Our policy is to have 4 GB RAM per cpu-core unless otherwise specified.
-* Machines older than 3 years may have 2GB/core and 6 years old have 1GB/core.
-* Cluster 2 is currently stored entirely in Rack 0 which is in Row 0, Position 4 of BH101 at 1700 4th St (Byers Hall).
-* Central services are on he,aleph2,and bet
-* CPU
-** 3 Silicon Mechanics Rackform nServ A4412.v4 s, each comprising 4 computers of 32 cpu-cores for a total of 384 cpu-cores.
-** 1 Dell C6145 with 128 cores.
-** An HP DL165G7 (24-way) is sgehead
-** more computers to come from Cluster 0, when Cluster 2 is fully ready.
-* DISK
-** HP disks - 40 TB RAID6 SAS (new in 2014)
-** Silicon Mechanics NAS - new in 2014 - 77 TB RAID6 SAS (new in 2014)
-** A HP DL160G5 and an MSA60 with 12 TB SAS (disks new in 2014)
-= Naming convention
-* The Hebrew alphabet is used for physical machines
-* Greek letters for VMs.
-* Functions (e.g. sgehead) are aliases (CNAMEs).
-* compbio.ucsf.edu and ucsf.bkslab.org domains both supported.
-= Disk organization =
-* shin aka nas1 mounted as /nfs/db/ =  72 TB SAS RAID6.  NOTE: ON BAND:  $ sudo /usr/local/RAID\ Web\ Console\ 2/startupui.sh to interact with raid controller.  username: raid.  pw: c2 pass
-* bet aka happy, internal: /nfs/store and psql (temp) as 10 TB SATA RAID10
-* elated on happy: /nfs/work only as 36 TB SAS RAID6
-* dalet exports /nfs/home & /nfs/home2
-= Special purpose machines - all .ucsf.bkslab.org =
-* sgehead aka gimel.cluster - nearly the only machine you'll need.
-* psi.cluster - PG fortran compiler (if it only has a .cluster address means it has no public address)
-* portal aka epsilon - secure access
-* zeta.cluster - Pipeline Pilot
-* shin, bet, and dalet are the three NFS servers. You should not need to log in to them.
- on teague desktop, /usr/local/RAID Web Console 2/startupui.sh
- connect to shin on public network
- raid /  C2 on shin
-* mysql1.cluster - general purpose mysql server (like former scratch)
-* pg1.cluster - general purpose postgres server
-* fprint.cluster - fingerprinting server
-=  =
 [[About our cluster]]

Cluster 2: Difference between revisions

Revision as of 19:22, 13 June 2024

Contents

Introduction

Priorities and Policies

Special machines

Notes

Hardware and physical location

Disk organization

Special purpose machines - all .ucsf.bkslab.org

Table of Server Information

SLURM

SGE

Navigation menu

Cluster 2: Difference between revisions

Revision as of 19:22, 13 June 2024

Introduction

Priorities and Policies

Special machines

Notes

Hardware and physical location

Disk organization

Special purpose machines - all .ucsf.bkslab.org

Table of Server Information

SLURM

SGE

Navigation menu

Search