Sysadmin idioms: Difference between revisions
Jump to navigation
Jump to search
(asdf) |
m (asfd) |
||
Line 1: | Line 1: | ||
= Cluster 2 = | |||
See progress of raid rebuild, eg on aleph. | |||
cat /proc/mdstat | |||
fire up a vm per sarah/matt | |||
ssh to he as s_xxx | |||
sudo virsh vncdisplay phi # Shows the VNC port phi is running on (vnc port 0) | |||
sudo vrish edit phi # Open phi's config | |||
# search for passwd ( /passwd<ENTER> ) | |||
#copy down VNC password | |||
#:q! # Exit vim | |||
exit # Exit virsh | |||
exit # Log out of he | |||
vncviewer he:<VNCPORT> (vncviewer he:0) | |||
Enter password | |||
log in restart: sshd, mysql, iptables, network (if it can't ping) | |||
= Cluster 0 = | |||
Disc space panic (cluster 0) | Disc space panic (cluster 0) | ||
sudo /usr/sbin/repquota /raid1 | sort -nrk3 | head | sudo /usr/sbin/repquota /raid1 | sort -nrk3 | head | ||
Line 10: | Line 23: | ||
~teague/Scripts/sshnodes.py to call ~teague/batch/mount-diva2.sh | ~teague/Scripts/sshnodes.py to call ~teague/batch/mount-diva2.sh | ||
= So obvious as to no longer be worth of being documented = | |||
Clear errors on jobs | Clear errors on jobs | ||
qstat -u adler | grep Eqw | cut -f 1 -d ' ' | xargs qmod -cj | qstat -u adler | grep Eqw | cut -f 1 -d ' ' | xargs qmod -cj | ||
Line 16: | Line 30: | ||
source env.csh | source env.csh | ||
zincserver.restart-backend.sh | zincserver.restart-backend.sh | ||
queue stuck? | queue stuck? | ||
Try qmod -c *lamed and qmod -e *@lamed | Try qmod -c *lamed and qmod -e *@lamed | ||
clean away old scratch files on nodes before your job starts (adler as example user) | clean away old scratch files on nodes before your job starts (adler as example user) | ||
find /scratch/adler -mindepth 1 -mtime +3 -exec rm -rvf {} \; | find /scratch/adler -mindepth 1 -mtime +3 -exec rm -rvf {} \; | ||
Restart ZINC15 | |||
cd /nfs/soft/www/apps/zinc15/zinc15-env/lib/python2.7/site-packages/zinc/data/models | cd /nfs/soft/www/apps/zinc15/zinc15-env/lib/python2.7/site-packages/zinc/data/models | ||
source /nfs/soft/www/apps/zinc15/zinc15-env/env.csh | source /nfs/soft/www/apps/zinc15/zinc15-env/env.csh | ||
Line 50: | Line 43: | ||
zincserver.start-backend.sh | zincserver.start-backend.sh | ||
killall -9 gunicorn | killall -9 gunicorn | ||
[[Category:Sysadmin]] | [[Category:Sysadmin]] | ||
[[Category:Idioms]] | [[Category:Idioms]] |
Revision as of 17:33, 17 August 2017
Cluster 2
See progress of raid rebuild, eg on aleph.
cat /proc/mdstat
fire up a vm per sarah/matt
ssh to he as s_xxx sudo virsh vncdisplay phi # Shows the VNC port phi is running on (vnc port 0) sudo vrish edit phi # Open phi's config # search for passwd ( /passwd<ENTER> ) #copy down VNC password #:q! # Exit vim exit # Exit virsh exit # Log out of he vncviewer he:<VNCPORT> (vncviewer he:0) Enter password log in restart: sshd, mysql, iptables, network (if it can't ping)
Cluster 0
Disc space panic (cluster 0)
sudo /usr/sbin/repquota /raid1 | sort -nrk3 | head
Save time as sysadmin C0
~teague/Scripts/sshnodes.py to call ~teague/batch/mount-diva2.sh
So obvious as to no longer be worth of being documented
Clear errors on jobs
qstat -u adler | grep Eqw | cut -f 1 -d ' ' | xargs qmod -cj
Start/restart ZINC15
source env.csh zincserver.restart-backend.sh
queue stuck?
Try qmod -c *lamed and qmod -e *@lamed
clean away old scratch files on nodes before your job starts (adler as example user)
find /scratch/adler -mindepth 1 -mtime +3 -exec rm -rvf {} \;
Restart ZINC15
cd /nfs/soft/www/apps/zinc15/zinc15-env/lib/python2.7/site-packages/zinc/data/models source /nfs/soft/www/apps/zinc15/zinc15-env/env.csh zincserver.restart-backend.sh zincserver.start-backend.sh killall -9 gunicorn