SGE idioms: Difference between revisions

From DISI
Jump to navigation Jump to search
(asdf)
(asdf)
 
Line 22: Line 22:
[[Category:SGE]]
[[Category:SGE]]
[[Category:Cluster]]
[[Category:Cluster]]
[[Category:Tutorial]]
[[Category:Tutorials]]

Latest revision as of 16:50, 3 June 2015

Here is a cookbook of things you can do with SGE.

Explain error on queue

qstat -f -explain E

Investigate the error. Disk full? Needs a reboot? Then, clear the error on (the queues on) a machine

qmod -c '*@<machine-name>*' 

Who is running jobs on the GPUs ?

qstat -q gpu.q -f -u '*'

Find jobs in the Eqw state.

qstat -u '*' | grep Eqw

Investigate it. Directory was removed? or authentication problem? We know about this. For now, just clear the error on a job in the Eqw state

qmod -cj <jobid>