Using PBS

This page gives an overview of torque, a specific PBS implementation. It's the most popular scheduler I've encountered, and is present on every cluster I run calculations on. For that reason, I will focus documentation on this particular PBS derivative.

Programs In PBS

qsub: Submit Jobs

qsub is responsible for submitting jobs to the queue. See the section about submitting jobs for more details.

qdel: Delete Jobs

qdel will delete a submitted job. Use it as follows:

  • qdel <job_id>

You can get the job ID number from qstat

qhold: Place a Job on Hold

qhold will place a job on hold so that it will retain its place in line, but it will not be run until you explicitly release the job. Use it as follows:

qhold <job_id>

qrls: Release a Held Job

qrls will release a job that's been placed on hold, telling the scheduler that it can begin that job at any time. Use it as follows:

qrls <job_id>

qstat: Get Job Statistics

qstat allows you to query the scheduler in many ways. You can use it in many different ways:

  • qstat -u <user_name>: List all jobs via <job_id> owned by the user <user_name>
  • qstat -q <queue_name>: Give statistics regarding number of running jobs, number of queued jobs, etc. associated with a particular queue, <queue_name>
  • qstat -f <job_id>: Print a plethora of information about a particular job, including where it is running, the environment variables it knows about, and what their values are (particularly the environment variables assigned by torque itself)

qalter: Change an Existing Job's Options

qalter allows you to change or add qsub-command-line flags to your job after it has already been submitted without having to kill it and re-submit it. It allows you to do things like change the name of your job, change its wallclock time, the number of processors it's going to use, etc.

Most schedulers have restrictions on qalter, and some disable it altogether. For example, I have yet to see a system where you can adjust the wallclock time or processor count after the job has already started running. This is done for obvious reasons. Use it as follows:

qalter [Options] <job_id>

For a list of [Options], see the submission section.

Submitting Jobs

This is probably the most important part of Torque — submitting jobs. The qsub program is used to submit jobs, and has a number of command-line options:

-a <date_time>......... : The earliest the job may be executed
-A <account_string>.... : Account name to use resources from 
-C <directive_prefix>.. : string to indicate PBS commands in a script
-e <error_file>........ : error file (with absolute or relative path)
-I .................... : Launch an interactive job
-h .................... : Hold the job at launch
-j oe ................. : Join output and error streams (stdout/stderr)
-l <resource_list>..... : Request specific resources (# of procs, walltime, etc.)
-m [a[b[e]]]........... : Mailing options. a=Mail if aborted by scheduler
                                           b=Mail when job begins
                                           e=Mail when job ends
-M <email_list>........ : List of comma-delimited email addresses to email notifications to
-N <job_name>.......... : Name of the job
-o <output_file>....... : output file (with absolute or relative path)
-q <queue_name>........ : Name of queue to submit job to
-S <shell_path>........ : Which shell to use to interpret the given commands
-W <attribute_list>.... : List of attributes -- advanced option, see later sections

The -C option specifies a string from which to read command-line options from the submitted script. The default on most systems I've used is "#PBS". Thus, every line that starts with [#PBS] is interpreted as a PBS option. For instance, the following two ways of submitting a job are identical:

qsub -N my_job -l walltime=10:00:00 -l nodes=2:ppn=8:gpus=4 -A TG-ABCDEFG -S /bin/bash <script_file>

and

qsub <script_file>

when <script_file> starts like

#PBS -N my_job
#PBS -l walltime=10:00:00
#PBS -l nodes=2:ppn=8:gpus=4
#PBS -A TG-ABCDEFG
#PBS -S /bin/bash

<rest of script body>

as long as <rest of script body> is the same as the <script_file> above, obviously.

Note the multiple uses of "-l". Resource lists can either be comma-delimited in a single entry, or they can be split up into multiple entries. Also, be sure to check the cluster documentation for how to request resources — the format of the processor requests (and limitations on processor requests) vary from system to system.

Using qsub

The qsub program is used as follows:

qsub [Options] [script_file]

The script_file argument is optional. If it is not provided, it will be read directly from standard input. I actually take advantage of this very frequently. For instance, if you have a very large number of simulations to start that are very similar (such as starting a bunch of simulations for an umbrella sampling simulation, or running a bunch of simulations at slightly different pH values for a titration curve, etc.), you can create a single script file with replaceable tokens (like REPLACE_PH), you can use a quick for loop with sed to replace your token with the actual value, and pipe that directly to qsub itself, rather than having to create a separate script for each job.

This has the advantage that you only have to change the skeleton job file once, and it will propagate to all of the jobs you submit from it. For example, a common command I use is

for ph in 2 3 4 5 6 7; do
   cd pH_$ph
   sed -e "s@REPLACE_PH@$ph@g" < ../skeleton_job_file.sh | qsub [Options]
   cd ../
done

Submitting Consecutive Jobs

Many times you'll find yourself wanting to break your simulation up into small chunks and run bits at a time, using restart files from the previous simulation to start the next chunk. In this case, many people will submit their first job, and then wait until the job finishes, submit the next one, etc. This suffers from a number of drawbacks. First, every time a job is submitted, it starts at the "back" of the queue. You knew you were going to submit this job a long time ago, and it could have been sitting in line gaining priority. (You suffer the same drawback of losing priority if you just have each job run qsub next_job at the end, and some schedulers block that behavior.)

One way you may think to fix this is to submit all of the later jobs using -h, or using qhold right after it's been submitted to place a hold on it. This suffers the drawback that you must manually release the job using qrls, requiring you to pay closer attention to the status of the job (and even then, your job may be held needlessly while you don't have access to the cluster if you job finishes while you are sleeping, for instance).

PBS offers a nice solution to this — dependencies. PBS allows you to submit a job that will be held until certain criteria is met and then automatically release the job once they are (or delete them if they cannot be met).

To do this, you must use an extra command-line option

qsub -W depend=<dependency_list> [Options] [script_file]

Your options for dependency lists are:

  • afterany:<job_id> — This will hold the job until after job_id finishes, either in error or successfully
  • afterok:<job_id> — This will hold the job until after job_id finishes. If job_id finishes successfully, the submitted job will be instantly released for execution. Otherwise, it is deleted.
  • afternotok:<job_id> — This will hold the job until after job_id finishes. If job_id fails, the submitted job will be instantly released for execution. Otherwise, it is deleted.

For example, to start my_script.sh, which has all of the PBS directives in the file as shown above, after the job 1453258 finishes, use the command

qsub -W depend=afterok:1453258 my_script.sh

Environment Variables Set By PBS

To make things easier on the user, PBS also creates several environment variables for your convenience in the shell session launched when your job starts. You can take advantage of these just like you would with any other environment variables you define yourself (note that your PBS job will, by default, have any environment variable you would get in a brand new shell, but not any that you defined in the shell session you submitted your job from).

The PBS-specific environment variables I think are the most useful are:

  • PBS_O_WORKDIR : This is the directory that your job was submitted from. Most of my jobs have a cd $PBS_O_WORKDIR command as the first command so every command takes place in the directory I submitted the job from.
  • PBS_NODEFILE : This is the absolute path of a file containing a list of hostnames that were assigned to your job. It is the proper format for most MPI running wrappers (i.e. mpiexec, mpirun, and the equivalent). Most MPI implementations allow you to specify a machine file (or will automatically detect PBS_NODEFILE), which will automatically set the number of threads to the number of available processors according to that machine file, and automatically assign the threads to those locations.

There are others, but I haven't found them particularly useful before. You can see them in the qsub man page.

Running MPI jobs in PBS

When you are running an MPI program, you often launch that program with syntax like:

mpirun -np 4 <program> [args]

to launch <program> with 4 threads on the same machine. This typically only makes sense if you have at least 4 processors (or processing cores) available.

When using PBS, you typically want to run on every processor you requested. This is where PBS_NODEFILE (mentioned in the previous section) comes in handy. PBS_NODEFILE is a temporary file created by your PBS job as soon as your job starts that tells your job where it can run, with a separate line for each node you were allocated (if you requested 8 cores on a single node, that node is listed 8 times).

Therefore, you can determine the number of nodes you have available via

nprocs=`echo $PBS_NODEFILE | wc -l`

which counts the number of lines in PBS_NODEFILE.

You can then use the mpirun command via:

mpirun -np $nprocs <program> [options]

As a better option, though, most mpirun or mpiexec programs will take an arbitrary machine file that tells it where to run (instead of -np 4). Therefore, you should use a command like:

mpirun -machinefile $PBS_NODEFILE <program> [options]

or

mpirun -hostfile $PBS_NODEFILE <program> [options]

Some MPIs are also compiled with Torque/PBS support (ask your sysadmin about this). If this is the case, then mpiexec will already know to look in PBS_NODEFILE, and all you will have to type is:

mpiexec <program> [options]

and it will already run 'correctly'.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License