Difference between revisions of "Gpus"

From DISI
Jump to: navigation, search
(created)
 
Line 1: Line 1:
 
We have 7 GPUs on the cluster.  (June 2016).  
 
We have 7 GPUs on the cluster.  (June 2016).  
 
There is a separate queue gpu.q to manage jobs
 
There is a separate queue gpu.q to manage jobs
 +
 +
To log in interactively to the gpu queue:
 +
<pre>
 +
qlogin -q gpu.q
 +
</pre>
 +
 +
Each gpu is a GeForce GTX 980
 +
<pre>
 +
/sbin/lspci | grep -i nvidia
 +
</pre>
 +
 +
Instructions for getting setup for GPU computation: http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-linux/
 +
 +
NVidia drivers are installed in /usr/loca/cuda*. To use the 7.5 drivers, make sure these environment variables are set:
 +
 +
<pre>
 +
export PATH=/usr/local/cuda-7.5/bin:$PATH
 +
export LD_LIBRARY_PATH=/usr/local/cuda-7.0/lib64:$LD_LIBRARY_PATH
 +
</pre>
 +
 +
Check that the drivers are installed:
 +
 +
<pre>
 +
cat /proc/driver/nvidia/version
 +
</pre>
 +
which should return
 +
<pre>
 +
NVRM version: NVIDIA UNIX x86_64 Kernel Module  352.79  Wed Jan 13 16:17:53 PST 2016
 +
GCC version:  gcc version 4.4.7 20120313 (Red Hat 4.4.7-16) (GCC)
 +
</pre>
 +
 +
Try compiling and running the sample programs:
 +
<pre>
 +
mkdir -p /scratch/$USER/cuda-7.5_samples
 +
cp -r /usr/local/cuda-7.5/samples /scratch/$USER/cuda-7.5_samples
 +
cd /scratch/$USER/cuda-7.5_samples/
 +
make
 +
</pre>
 +
 +
Run the sample program
 +
<pre>
 +
/nfs/ge/bin/on-one-gpu - /scratch/$USER/cuda-7.5_samples/bin/x86_64/linux/release/deviceQuery
 +
</pre>
 +
 +
  
 
Here is a sample script to run amber:
 
Here is a sample script to run amber:
Line 23: Line 68:
 
Note that we run the executable with on-one-gpu.  
 
Note that we run the executable with on-one-gpu.  
 
This manages which gpus are used.  
 
This manages which gpus are used.  
 +
 +
 +
 +
 +
  
 
If you generate significant output, which is generally but not always true,
 
If you generate significant output, which is generally but not always true,

Revision as of 10:52, 1 July 2016

We have 7 GPUs on the cluster. (June 2016). There is a separate queue gpu.q to manage jobs

To log in interactively to the gpu queue:

qlogin -q gpu.q

Each gpu is a GeForce GTX 980

/sbin/lspci | grep -i nvidia

Instructions for getting setup for GPU computation: http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-linux/

NVidia drivers are installed in /usr/loca/cuda*. To use the 7.5 drivers, make sure these environment variables are set:

export PATH=/usr/local/cuda-7.5/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-7.0/lib64:$LD_LIBRARY_PATH

Check that the drivers are installed:

cat /proc/driver/nvidia/version

which should return

NVRM version: NVIDIA UNIX x86_64 Kernel Module  352.79  Wed Jan 13 16:17:53 PST 2016
GCC version:  gcc version 4.4.7 20120313 (Red Hat 4.4.7-16) (GCC)

Try compiling and running the sample programs:

mkdir -p /scratch/$USER/cuda-7.5_samples
cp -r /usr/local/cuda-7.5/samples /scratch/$USER/cuda-7.5_samples
cd /scratch/$USER/cuda-7.5_samples/
make

Run the sample program

/nfs/ge/bin/on-one-gpu - /scratch/$USER/cuda-7.5_samples/bin/x86_64/linux/release/deviceQuery


Here is a sample script to run amber:

/nfs/work/tbalius/MOR/run_amber/run.pmemd_cuda_wraper.csh

Here is an excerpt from script

##########
cat << EOF > qsub.amber.csh
#\$ -S /bin/csh
#\$ -cwd
#\$ -q gpu.q
#\$ -o stdout
#\$ -e stderr

# export CUDA_VISIBLE_DEVICES="0,1,2,3" 
# setenv CUDA_VISIBLE_DEVICES "0,1,2,3"
setenv AMBERHOME /nfs/soft/amber/amber14/ 
set amberexe = "/nfs/ge/bin/on-one-gpu - \$AMBERHOME/bin/pmemd.cuda"
##########

Note that we run the executable with on-one-gpu. This manages which gpus are used.




If you generate significant output, which is generally but not always true, it is important to write locally to scratch and then copy things over the network onto the disk. If you write large amounts of data directly to the NFS disk it can cause problems for others.