SGE idioms: Difference between revisions
Jump to navigation
Jump to search
(asdf) |
(asdf) |
||
Line 22: | Line 22: | ||
[[Category:SGE]] | [[Category:SGE]] | ||
[[Category:Cluster]] | [[Category:Cluster]] | ||
[[Category: | [[Category:Tutorials]] |
Latest revision as of 16:50, 3 June 2015
Here is a cookbook of things you can do with SGE.
Explain error on queue
qstat -f -explain E
Investigate the error. Disk full? Needs a reboot? Then, clear the error on (the queues on) a machine
qmod -c '*@<machine-name>*'
Who is running jobs on the GPUs ?
qstat -q gpu.q -f -u '*'
Find jobs in the Eqw state.
qstat -u '*' | grep Eqw
Investigate it. Directory was removed? or authentication problem? We know about this. For now, just clear the error on a job in the Eqw state
qmod -cj <jobid>