Sysadmin idioms: Difference between revisions
Jump to navigation
Jump to search
(asdf) |
|||
(3 intermediate revisions by one other user not shown) | |||
Line 1: | Line 1: | ||
= Cluster 2 = | |||
disk trouble? | |||
echo 1 > /proc/sys/vm/drop_caches | |||
mount -o remount /mnt/nfs/scratch/A | |||
See progress of raid rebuild, eg on aleph. | See progress of raid rebuild, eg on aleph. | ||
cat /proc/mdstat | cat /proc/mdstat | ||
add a new CNAME on alpha (''' DO NOT RUN THIS SCRIPT WITHOUT UNDERSTANDING WHAT IT DOES, IT COULD WIPE OUT ALL THE MACHINES CANONICAL NAMES ''') | |||
sudo /opt/bks/bin/add-host-alias nfs-db3 abacus | |||
then service named restart | |||
fire up a vm per sarah/matt | fire up a vm per sarah/matt | ||
Line 38: | Line 25: | ||
log in restart: sshd, mysql, iptables, network (if it can't ping) | log in restart: sshd, mysql, iptables, network (if it can't ping) | ||
= Cluster 0 = | |||
Disc space panic (cluster 0) | |||
sudo /usr/sbin/repquota /raid1 | sort -nrk3 | head | |||
Save time as sysadmin C0 | |||
~teague/Scripts/sshnodes.py to call ~teague/batch/mount-diva2.sh | |||
on | = So obvious as to no longer be worth of being documented = | ||
Clear errors on jobs | |||
qstat -u adler | grep Eqw | cut -f 1 -d ' ' | xargs qmod -cj | |||
Start/restart ZINC15 | |||
source | source env.csh | ||
zincserver.restart-backend.sh | zincserver.restart-backend.sh | ||
= after a vmware1 failure = | |||
ssh root@vmware1.bkslab.org | |||
(based on C0, twice) | |||
vim-cmd vmsvc/getallvms | |||
vim-cmd vmsvc/power.on 1792 | |||
on dock | |||
service httpd start | |||
(root on dock) | |||
queue stuck? | |||
Try qmod -c *lamed and qmod -e *@lamed | |||
clean away old scratch files on nodes before your job starts (adler as example user) | |||
find /scratch/adler -mindepth 1 -mtime +3 -exec rm -rvf {} \; | |||
Restart ZINC15 | |||
cd /nfs/soft/www/apps/zinc15/zinc15-env/lib/python2.7/site-packages/zinc/data/models | |||
source /nfs/soft/www/apps/zinc15/zinc15-env/env.csh | |||
zincserver.restart-backend.sh | |||
zincserver.start-backend.sh | |||
killall -9 gunicorn | |||
[[Category:Sysadmin]] | [[Category:Sysadmin]] | ||
[[Category:Idioms]] | [[Category:Idioms]] |
Latest revision as of 00:05, 10 September 2019
Cluster 2
disk trouble?
echo 1 > /proc/sys/vm/drop_caches mount -o remount /mnt/nfs/scratch/A
See progress of raid rebuild, eg on aleph.
cat /proc/mdstat
add a new CNAME on alpha ( DO NOT RUN THIS SCRIPT WITHOUT UNDERSTANDING WHAT IT DOES, IT COULD WIPE OUT ALL THE MACHINES CANONICAL NAMES )
sudo /opt/bks/bin/add-host-alias nfs-db3 abacus
then service named restart
fire up a vm per sarah/matt
ssh to he as s_xxx sudo virsh vncdisplay phi # Shows the VNC port phi is running on (vnc port 0) sudo vrish edit phi # Open phi's config # search for passwd ( /passwd<ENTER> ) #copy down VNC password #:q! # Exit vim exit # Exit virsh exit # Log out of he vncviewer he:<VNCPORT> (vncviewer he:0) Enter password log in restart: sshd, mysql, iptables, network (if it can't ping)
Cluster 0
Disc space panic (cluster 0)
sudo /usr/sbin/repquota /raid1 | sort -nrk3 | head
Save time as sysadmin C0
~teague/Scripts/sshnodes.py to call ~teague/batch/mount-diva2.sh
So obvious as to no longer be worth of being documented
Clear errors on jobs
qstat -u adler | grep Eqw | cut -f 1 -d ' ' | xargs qmod -cj
Start/restart ZINC15
source env.csh zincserver.restart-backend.sh
after a vmware1 failure
ssh root@vmware1.bkslab.org (based on C0, twice)
vim-cmd vmsvc/getallvms
vim-cmd vmsvc/power.on 1792
on dock
service httpd start (root on dock)
queue stuck?
Try qmod -c *lamed and qmod -e *@lamed
clean away old scratch files on nodes before your job starts (adler as example user)
find /scratch/adler -mindepth 1 -mtime +3 -exec rm -rvf {} \;
Restart ZINC15
cd /nfs/soft/www/apps/zinc15/zinc15-env/lib/python2.7/site-packages/zinc/data/models source /nfs/soft/www/apps/zinc15/zinc15-env/env.csh zincserver.restart-backend.sh zincserver.start-backend.sh killall -9 gunicorn