Sysadmin idioms: Difference between revisions

From DISI
Jump to navigation Jump to search
(asdf)
 
(4 intermediate revisions by one other user not shown)
Line 1: Line 1:
Restart SEA15
= Cluster 2 =
* login to tau, become www
* source /nfs/soft/www/apps/sea/sea15/env.csh
* run run-sea-server.sh


Disc space panic (cluster 0)
disk trouble?
  sudo /usr/sbin/repquota /raid1 | sort -nrk3 | head
  echo 1 > /proc/sys/vm/drop_caches
 
  mount -o remount /mnt/nfs/scratch/A
Save time as sysadmin C0
  ~teague/Scripts/sshnodes.py to call ~teague/batch/mount-diva2.sh
 
Clear errors on jobs
qstat -u adler | grep Eqw | cut -f 1 -d ' ' | xargs qmod -cj
 
Start/restart ZINC15
source env.csh
zincserver.restart-backend.sh


See progress of raid rebuild, eg on aleph.
See progress of raid rebuild, eg on aleph.
  cat /proc/mdstat
  cat /proc/mdstat
Grant access on bet:
xfs_quota -xc "limit bsoft=1000g bhard=1500g tbalius" /srv/work


queue stuck?
add a new CNAME on alpha (''' DO NOT RUN THIS SCRIPT WITHOUT UNDERSTANDING WHAT IT DOES, IT COULD WIPE OUT ALL THE MACHINES CANONICAL NAMES ''')
  Try qmod -c *lamed and qmod -e *@lamed
  sudo /opt/bks/bin/add-host-alias nfs-db3 abacus
then service named restart


fire up a vm per sarah/matt
fire up a vm per sarah/matt
Line 38: Line 25:
  log in restart: sshd, mysql, iptables, network (if it can't ping)
  log in restart: sshd, mysql, iptables, network (if it can't ping)


clean away old scratch files on nodes before your job starts (adler as example user)
= Cluster 0 =
  find /scratch/adler -mindepth 1 -mtime +3 -exec rm -rvf {} \;
Disc space panic (cluster 0)
  sudo /usr/sbin/repquota /raid1 | sort -nrk3 | head


   
Save time as sysadmin C0
  ~teague/Scripts/sshnodes.py to call ~teague/batch/mount-diva2.sh


on tau:
= So obvious as to no longer be worth of being documented =
Clear errors on jobs
qstat -u adler | grep Eqw | cut -f 1 -d ' ' | xargs qmod -cj


cd /nfs/soft/www/apps/zinc15/zinc15-env/lib/python2.7/site-packages/zinc/data/models
Start/restart ZINC15
  source /nfs/soft/www/apps/zinc15/zinc15-env/env.csh
  source env.csh
  zincserver.restart-backend.sh
  zincserver.restart-backend.sh
zincserver.start-backend.sh
killall -9 gunicorn


= after a vmware1 failure =
ssh root@vmware1.bkslab.org
(based on C0,  twice)


vim-cmd vmsvc/getallvms
vim-cmd vmsvc/power.on 1792
on dock
service httpd start
(root on dock)


time wget --user gpcr --password xtal -O - "http://zinc15.docking.org/substances.txt:smiles,zinc_id,purchasability?purchasability:gt=10&mwt:le=350&mwt:gt=50&logp:le=3.5&structure.num_rotatable_bonds:le=7&structure:contains=[C;D1]%3D[CD3]C(%3DO)OC&count=all" | tee 18.txt | cat -n
queue stuck?
Try qmod -c *lamed and qmod -e *@lamed


clean away old scratch files on nodes before your job starts (adler as example user)
find /scratch/adler -mindepth 1 -mtime +3 -exec rm -rvf {} \;


update purchasability:
Restart ZINC15
  update substance set purchasability = substance_best_purchasability(sub_id) where sub_id in (select sub_id from substance where purchasability is null limit 10000);
  cd /nfs/soft/www/apps/zinc15/zinc15-env/lib/python2.7/site-packages/zinc/data/models
source /nfs/soft/www/apps/zinc15/zinc15-env/env.csh
zincserver.restart-backend.sh
zincserver.start-backend.sh
killall -9 gunicorn


[[Category:Sysadmin]]
[[Category:Sysadmin]]
[[Category:Idioms]]
[[Category:Idioms]]

Latest revision as of 00:05, 10 September 2019

Cluster 2

disk trouble?

echo 1 > /proc/sys/vm/drop_caches
mount -o remount /mnt/nfs/scratch/A

See progress of raid rebuild, eg on aleph.

cat /proc/mdstat

add a new CNAME on alpha ( DO NOT RUN THIS SCRIPT WITHOUT UNDERSTANDING WHAT IT DOES, IT COULD WIPE OUT ALL THE MACHINES CANONICAL NAMES )

sudo /opt/bks/bin/add-host-alias nfs-db3 abacus

then service named restart

fire up a vm per sarah/matt

ssh to he as s_xxx
sudo virsh vncdisplay phi  # Shows the VNC port phi is running on (vnc port 0)
sudo vrish edit phi  # Open phi's config
# search for passwd ( /passwd<ENTER> )
#copy down VNC password
#:q!  # Exit vim
exit  # Exit virsh
exit  # Log out of he
vncviewer he:<VNCPORT>  (vncviewer he:0)
Enter password
log in restart: sshd, mysql, iptables, network (if it can't ping)

Cluster 0

Disc space panic (cluster 0)

sudo /usr/sbin/repquota /raid1 | sort -nrk3 | head

Save time as sysadmin C0

~teague/Scripts/sshnodes.py to call ~teague/batch/mount-diva2.sh 

So obvious as to no longer be worth of being documented

Clear errors on jobs

qstat -u adler | grep Eqw | cut -f 1 -d ' ' | xargs qmod -cj

Start/restart ZINC15

source env.csh
zincserver.restart-backend.sh

after a vmware1 failure

ssh root@vmware1.bkslab.org
(based on C0,  twice)

vim-cmd vmsvc/getallvms

vim-cmd vmsvc/power.on 1792

on dock

service httpd start
(root on dock)

queue stuck?

Try qmod -c *lamed and qmod -e *@lamed

clean away old scratch files on nodes before your job starts (adler as example user)

find /scratch/adler -mindepth 1 -mtime +3 -exec rm -rvf {} \;

Restart ZINC15

cd /nfs/soft/www/apps/zinc15/zinc15-env/lib/python2.7/site-packages/zinc/data/models
source /nfs/soft/www/apps/zinc15/zinc15-env/env.csh
zincserver.restart-backend.sh
zincserver.start-backend.sh
killall -9 gunicorn