Category:Sysadmin
To contact the systadmins, look here.
These are pages for docking.org sysadmins, also useful for anyone who wants to install, run and manage a docking.org site. Sysadmin pages are *only* relevant if you have sudo on a docking.org type cluster. For other roles, see here. Once the docking lab has been set up , it must be maintained. This guide covers everything it takes to be a docking.org sysadmin.
For security reasons, documents pertaining to security are kept in google docs. For access, contact the sysadmins. This includes 1) Software Licenses 2) Our Computers 3) Sysadmin security secrets
Pro-active maintenance
There are two kinds of maintenance, reactive and pro-active. Pro-active maintenance is classified temporally, reactive is always in the present.
- Periodic system maintenance
- manage public DNS https://www.cgl.ucsf.edu/dns_dhcp/
- edit host alias file to define private address machine names. Use alpha:/opt/bks/bin/add-host-alias. This script freezes and unfreezes the dynamic zones using rndc freeze <zone> and rndc thaw <zone> e.g. zone is cluster.ucsf.bkslab.org
- NB CNAMES must be terminated with a dot .
- NB bkslab.org is managed by aaa1 but uoft.bkslab.org is delegated to spinaltap and ucsf.bkslab.org to alpha
- use joker.com to manage top level bkslab.org domain
Conventions
- When we create a desktop, we create the user account l_<USER> (l as in lion or local). This allows the user to use the desktop if ldap or network are down.
Reactive maintenance
- Create a new user
- Retire a user
- RAID disk failure
- disk full
- security breach
Policies
- we have an elaborate scheme for private addresses that is possibly more trouble than it is worth
- if a machine does not have to be on the public network, is should not be on the public network
- use iptables aggressively to suppress nearly all public services outside the lab
- use VMs
- document all machines in the google docs
- document everything that is not security related on the wiki
System down/hung/crashed/offline
This section has two parts. In the first, Diagnosis, we enumerate the possible problems and what the symptoms might look like. In the second part, we rehearse scenarios of how to proceed. There are so many different kinds of failure that it is difficult to anticipate every one. Still, we have tried to write down the most common failure modes and sensible ways to proceed.
Diagnosis
- system up but df hangs -> disk is off, hung, or unmounted. Solution ->
- cannot ping head node.
- no home directory
- web server down or does not respond
- jobs don't start in queuing system
- disk full
- kernel panic
Scenarios
- Install new software by request
After power failure
- check that mailman came back up properly
- Cluster 0 - check that XML RPC services came back up properly
- check on pipeline pilot server back up correctly.
When someone leaves the lab
- back up their data or move to proust as appropriate
- reduce disk footprint as much as possible
- offer them portable USB disks for backups
- Add new hardware to the cluster
Procedures
- How to run backups
- How to restore
- How to set up a new computer
- Monthly tasks
- Security
Updating Software
- Delphi
- AMSOL
- DOCK
- dockenv
- mol2db
- molinspiration
- OpenEye
- Cactvs
- Daylight
- Marvin/JChem
Troubleshooting Services
- MySQL
- Perl
- Apache, mod_perl
- Python
- Mailman
- condor
- sendmail
Pages in category "Sysadmin"
The following 157 pages are in this category, out of 157 total.
C
- Centos
- CentOS 7 Base.repo
- Cert-workaround
- Certificate
- CHARMM
- Cluster 0
- Cluster Narrative
- Cluster Security Monitoring Tools
- Cluster Theory
- Compbio middleware
- Comptuer assignments
- Configure new disk
- Configuring an OpenSSH Server
- Configuring IPMI
- Control of bkslab.org
- Convert CD to an ISO Image
- Create decoy tables
- Create new user
- Cron
H
- How to access X11 Forwarding after becoming root
- How to add new users
- How to be someone
- How to change the hostname of a machine
- How to Change the Password for a User Command Line LDAP:
- How to Check Harddrive information
- How to check RAM details
- How to create a iso image from command line
- How to Create Encrypted Password
- How to Expand the Hard Drive Size of a VM
- How to Fix the VNC Viewer in Foreman
- How to Install a Desktop on Cluster 2
- How to Install an LDAP 389 Master Server
- How to Install Nagios
- How to Make Your Own yum Repo
- How to Replace a Failed Disk
- How to Secure Single Mode Linux
- How to See What Something Resolves to
- How to See Who is Running the Most on a Raid
- How to Set Up Webalizer
- How to Setup / Edit Quotas
- How to spin up a new virtual machine
- How to use tar for archive & compression
- How to use the sed command
- How to write a puppet config
- HP Computer Startup Issues
- HP Pro Network Switches
- Hpacucli
- HTTPD Semaphore/Mutex Lock Problem
- Hypervisor
I
M
P
R
S
- Schrodinger
- Screen
- Set up a database server
- Set up a new Desktop
- Set up a Server
- SGE notes
- Sharing file systems with nfs server and mounting file systems with nfs client
- Singularity
- So you want to set up a lab
- Software upgrades
- SSH broken pipe error
- Sun Grid Engine (SGE)
- Supported platforms for DOCK 3.7
- Switch Configuration
- Switch Setup
- Sysadmin
- Sysadmin guide
- Sysadmin idioms
- Sysadmin-quotas
- System administrator's guide