SGE idioms: Difference between revisions
Jump to navigation
Jump to search
(asdf) |
(asdf) |
||
(One intermediate revision by the same user not shown) | |||
Line 1: | Line 1: | ||
Here is a cookbook of things you can do with SGE. | |||
Explain error on queue | |||
qstat -f -explain E | qstat -f -explain E | ||
Investigate the error. Disk full? Needs a reboot? | |||
Then, clear the error on (the queues on) a machine | |||
qmod -c '*@<machine-name>*' | |||
Who is running jobs on the GPUs ? | |||
qstat -q gpu.q -f -u '*' | qstat -q gpu.q -f -u '*' | ||
clear the error on a job | Find jobs in the Eqw state. | ||
qstat -u '*' | grep Eqw | |||
Investigate it. Directory was removed? or authentication problem? We know about this. For now, just | |||
clear the error on a job in the Eqw state | |||
qmod -cj <jobid> | qmod -cj <jobid> | ||
[[Category:Idioms]] | [[Category:Idioms]] | ||
[[Category:Internal]] | [[Category:Internal]] | ||
[[Category:SGE]] | |||
[[Category:Cluster]] | |||
[[Category:Tutorials]] |
Latest revision as of 16:50, 3 June 2015
Here is a cookbook of things you can do with SGE.
Explain error on queue
qstat -f -explain E
Investigate the error. Disk full? Needs a reboot? Then, clear the error on (the queues on) a machine
qmod -c '*@<machine-name>*'
Who is running jobs on the GPUs ?
qstat -q gpu.q -f -u '*'
Find jobs in the Eqw state.
qstat -u '*' | grep Eqw
Investigate it. Directory was removed? or authentication problem? We know about this. For now, just clear the error on a job in the Eqw state
qmod -cj <jobid>