Sysadmin idioms
Jump to navigation
Jump to search
Cluster 2
disk trouble?
echo 1 > /proc/sys/vm/drop_caches mount -o remount /mnt/nfs/scratch/A
See progress of raid rebuild, eg on aleph.
cat /proc/mdstat
add a new CNAME on alpha ( DO NOT RUN THIS SCRIPT WITHOUT UNDERSTANDING WHAT IT DOES, IT COULD WIPE OUT ALL THE MACHINES CANONICAL NAMES )
sudo /opt/bks/bin/add-host-alias nfs-db3 abacus
then service named restart
fire up a vm per sarah/matt
ssh to he as s_xxx sudo virsh vncdisplay phi # Shows the VNC port phi is running on (vnc port 0) sudo vrish edit phi # Open phi's config # search for passwd ( /passwd<ENTER> ) #copy down VNC password #:q! # Exit vim exit # Exit virsh exit # Log out of he vncviewer he:<VNCPORT> (vncviewer he:0) Enter password log in restart: sshd, mysql, iptables, network (if it can't ping)
Cluster 0
Disc space panic (cluster 0)
sudo /usr/sbin/repquota /raid1 | sort -nrk3 | head
Save time as sysadmin C0
~teague/Scripts/sshnodes.py to call ~teague/batch/mount-diva2.sh
So obvious as to no longer be worth of being documented
Clear errors on jobs
qstat -u adler | grep Eqw | cut -f 1 -d ' ' | xargs qmod -cj
Start/restart ZINC15
source env.csh zincserver.restart-backend.sh
after a vmware1 failure
ssh root@vmware1.bkslab.org (based on C0, twice)
vim-cmd vmsvc/getallvms
vim-cmd vmsvc/power.on 1792
on dock
service httpd start (root on dock)
queue stuck?
Try qmod -c *lamed and qmod -e *@lamed
clean away old scratch files on nodes before your job starts (adler as example user)
find /scratch/adler -mindepth 1 -mtime +3 -exec rm -rvf {} \;
Restart ZINC15
cd /nfs/soft/www/apps/zinc15/zinc15-env/lib/python2.7/site-packages/zinc/data/models source /nfs/soft/www/apps/zinc15/zinc15-env/env.csh zincserver.restart-backend.sh zincserver.start-backend.sh killall -9 gunicorn