CX1 is intended for general purpose workloads, with an emphasis on high throughput for large numbers of relatively small jobs. Ideally jobs should be sized to fit within a single compute node. Jobs spanning multiple nodes are possible, but be sure to read the notes below first.
Submitting a job
Jobs are submitted using "qsub". All jobs require a resource specification that indicates the size of the compute resources required and anticipated runtime. This specification should be in the top of the job script and look like:
Where N is the number of nodes, and X and Y are the number of cpus and amount of memory per node and HH is the expected runtime of the job in hours.
There is no need to specify a queue, simply define the job in terms of the resources it requires and the system will run it as soon as suitable resources become available.
The following classes of jobs are supported:
|Number of nodes N||ncpus/node||Max mem/node||Max walltime/hr||Max number of running jobs per user|
|throughput||1||1-8||96GB||up to 72hr||unlimited for jobs <=24hr in length|
|general||1 - 16||32||62GB or 124GB||up to 72hr||unlimited for jobs <=24hr in length|
|singlenode||1||48||124GB||up to 24hr||10|
|multinode||3 - 16||12||46GB||up to 48hr||unlimited|
|debug||1||1-8||96GB||up to 30 mins||1|
|large memory||1||multiples of 10||multiples of 120GB||48 hr||unlimited|
|GPU||1-2||1-8||1 - 32GB||48hr
72hr (P1000 only)
|long||1||1-8||96GB||72 - 1000 hr||1|
- The throughput class is where you should place jobs if you have a very large number of independent jobs. If this describes your work, please read the section below on array jobs. Jobs in this category should generally be configured to use just a single cpu. Use more only if you know your program is capable of using it, and then only to reduce the runtime to fit into the walltime limits.
- The general class is for jobs that can occupy a whole node. Aim to place your jobs here if you know that the program you are using is able to run in parallel on all of the cpus. Jobs in this class may use more than a single node, but be aware that the interconnecting network is very poor, making it suitable only for jobs that perform little intracommunication. If this describes your jobs, consider refactoring them as array jobs. Read more about configuring MPI jobs.
- multinode is for jobs, typically MPI ones, that can run on multiple nodes and that perform substantial communication between processes. If you aren't sure if this describes your jobs, first try running on multiple nodes in the general class and see how the performance changes as you more from 1 to 2 nodes. If you wish to run jobs larger than this class accommodates, consider moving to CX2. Read more about configuring MPI jobs.
- The singlenode class represents the highest performance available on a single node. This class is aimed at quick turnaround of parallel jobs that, by limitation of their design are incapable of multinode execution (for example Gaussian, or Gromacs). Note that job lengths are strictly limited here, and no extensions past 24hrs are permitted. If you need longer, consider whether your jobs would be better placed in general.
- The large memory class accommodates jobs that need a very large amount of memory and cannot use memory distributed over multiple nodes. If your jobs need more memory than this class accommodates, consider moving to AX4.
- Debug class jobs are intended for quick turnaround of test jobs. Do not size real work jobs to fit here, it is for testing only.
- GPU class jobs are for programs that are explicitly designed to use GPU accelerators. Read more about GPU jobs
- The Long job class if provided as a convenience for rare occasions when you must run a very long duration job. Please note that we cannot guarantee that very long jobs can always run to completion. Only use this class if the program you need to run is not capable of being restarted.
Notes on choosing job resources
The general principle when sizing jobs on cx1 is to keep them as small as possible - the larger the resource request you make the longer the job may spend queuing waiting for matching resources to become free.
Notes on choosing job resources
Running a large number of jobs
You are limited to having 100 jobs queued at once. If you need to run many jobs that are all similar, differing only by input, please read about array jobs.
Selecting job duration
The maximum run-time of a job must be specified using:
- #PBS -lwalltime=HH:0:0
Sensible values for the length in hours HH are 24, 48 and 72, there's little benefit to be gained by selecting intermediate values. Do note that the shorter the request the sooner the job is likely to start. We strongly encourage 24 hour jobs.
The job will end as soon as the jobscript completes, so there's no inefficiency in running jobs shorter than the walltime request.
Do note you can expect the attained performance of a a given job will be quite variable, particularly in the throughput class.
We limit the maximum runtime to 72 hours to help ensure fair use of the machine. Occasionally you may find that a running job nearing its wall-time limit would benefit from an extension to allow it to complete. This is something you can do yourself via https://selfservice.rcs.imperial.ac.uk , but please note the following:
- Jobs started with a wall-time request of 24h and multinode class jobs cannot be extended
- We make no guarantees that extensions of longer jobs will always be possible
- Extensions can only be made in 24h increments within 12h of the estimated completion time
Before relying on this mechanism, reflect on impact that the failure of a very long running job would have on your work. Wherever possible configure jobs to frequently save checkpoint files so that they can be restarted.
Selecting memory size
Cx1 has nodes with up to 256GB of memory, but the majority have 128GB or less. If you don't know how much to use, start with "1gb", which maximises the opportunities for the job to run. If the job uses more memory than requested it will be terminated by PBS an you'll see a message in the .e file telling you so. Simply increase the request and try again. A similar termination will occur if your job's average processor usage exceeds the requested value of ncpus.
Alternatively, you may measure the memory use of a sample job with the memusage program.
Selecting the number of cpus
It is important to appreciate that increasing the number of cpus allocated to a job does not guarantee that the program run in the job will run any faster or even be able to use them at all. In general, you will need to know something about the program you are running (or libraries your are using if writing your own code) to know whether it is capable of parallel execution. If you are unsure, consult its documentation, read the application notes in our documentation, or attend one of our walk-in clinics.
If you are running array jobs, only use ncpus>1 to keep the individual subjobs within the walltime restrictions.
What if a job reaches its walltime limit?
If the job is still running when its walltime is reached it will be terminated, and you'll see a note to that effect in the .e log file. At this point the TMPDIR is deleted and anything stored there will be lost.
You can guard against this loss of data by prefixing the long-running program in your jobscript with "timeout xxh". This will ensure that it runs for no more than xx hours, providing an aooportunity for the jobscript to complete gracefully. You will need to set the timeout period xx to something slightly less than the requested walltime.
- cp $WORK/my/input/files $TMPDIR
- # run for at most 23.5 hours, giving 30 mins grace
- timeout 23.5h $HOME/my_long_running_programme
- cp * $WORK/my/output/files
Bear in mind that this is only useful if the long-running programme in question has been producing useful intermediate files.