Sysadmin idioms

From DISI
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Cluster 2

disk trouble?

echo 1 > /proc/sys/vm/drop_caches
mount -o remount /mnt/nfs/scratch/A

See progress of raid rebuild, eg on aleph.

cat /proc/mdstat

add a new CNAME on alpha ( DO NOT RUN THIS SCRIPT WITHOUT UNDERSTANDING WHAT IT DOES, IT COULD WIPE OUT ALL THE MACHINES CANONICAL NAMES )

sudo /opt/bks/bin/add-host-alias nfs-db3 abacus

then service named restart

fire up a vm per sarah/matt

ssh to he as s_xxx
sudo virsh vncdisplay phi  # Shows the VNC port phi is running on (vnc port 0)
sudo vrish edit phi  # Open phi's config
# search for passwd ( /passwd<ENTER> )
#copy down VNC password
#:q!  # Exit vim
exit  # Exit virsh
exit  # Log out of he
vncviewer he:<VNCPORT>  (vncviewer he:0)
Enter password
log in restart: sshd, mysql, iptables, network (if it can't ping)

Cluster 0

Disc space panic (cluster 0)

sudo /usr/sbin/repquota /raid1 | sort -nrk3 | head

Save time as sysadmin C0

~teague/Scripts/sshnodes.py to call ~teague/batch/mount-diva2.sh 

So obvious as to no longer be worth of being documented

Clear errors on jobs

qstat -u adler | grep Eqw | cut -f 1 -d ' ' | xargs qmod -cj

Start/restart ZINC15

source env.csh
zincserver.restart-backend.sh

after a vmware1 failure

ssh root@vmware1.bkslab.org
(based on C0,  twice)

vim-cmd vmsvc/getallvms

vim-cmd vmsvc/power.on 1792

on dock

service httpd start
(root on dock)

queue stuck?

Try qmod -c *lamed and qmod -e *@lamed

clean away old scratch files on nodes before your job starts (adler as example user)

find /scratch/adler -mindepth 1 -mtime +3 -exec rm -rvf {} \;

Restart ZINC15

cd /nfs/soft/www/apps/zinc15/zinc15-env/lib/python2.7/site-packages/zinc/data/models
source /nfs/soft/www/apps/zinc15/zinc15-env/env.csh
zincserver.restart-backend.sh
zincserver.start-backend.sh
killall -9 gunicorn