SGE idioms: Difference between revisions

From DISI
Jump to navigation Jump to search
(asdf)
 
(asdf)
 
(2 intermediate revisions by the same user not shown)
Line 1: Line 1:
explain error on queue
Here is a cookbook of things you can do with SGE.
 
Explain error on queue
  qstat -f -explain E
  qstat -f -explain E


Investigate the error. Disk full?  Needs a reboot?
Then, clear the error on (the queues on) a machine
qmod -c '*@<machine-name>*'
Who is running jobs on the GPUs ?
qstat -q gpu.q -f -u '*'
Find jobs in the Eqw state.
qstat -u '*' | grep Eqw
Investigate it.  Directory was removed? or authentication problem?  We know about this. For now, just
clear the error on a job in the Eqw state
qmod -cj <jobid>


[[Category:Idioms]]
[[Category:Idioms]]
[[Category:Internal]]
[[Category:Internal]]
[[Category:SGE]]
[[Category:Cluster]]
[[Category:Tutorials]]

Latest revision as of 16:50, 3 June 2015

Here is a cookbook of things you can do with SGE.

Explain error on queue

qstat -f -explain E

Investigate the error. Disk full? Needs a reboot? Then, clear the error on (the queues on) a machine

qmod -c '*@<machine-name>*' 

Who is running jobs on the GPUs ?

qstat -q gpu.q -f -u '*'

Find jobs in the Eqw state.

qstat -u '*' | grep Eqw

Investigate it. Directory was removed? or authentication problem? We know about this. For now, just clear the error on a job in the Eqw state

qmod -cj <jobid>