Batch Directives¶
Tip
For a full list of directives, see Slurm's official documentation.
The first section of a batch script (after the shebang) always contains the Slurm Directives, which specify the resource requests for your job. The scheduler parses these in order to allocate CPUs, memory, walltime, etc. to your job request.
We would appreciate if you took a moment to review our Standard Practices to ensure fair access and use of compute resources for all users.
Minimum Viable Batch Directives¶
There is a certain set of directives that must be present in all batch scripts in order for the scheduler to have enough information to run your job. There is no one-size-fits-all solution, and there are some options that are either redundant or should not be used simultaneously. You should always review and adjust your batch directives for your particular scenario.
At a minimum, batch scripts should have:
Job Name
#SBATCH --job-name=hello_world
Partition
#SBATCH --partition=standard
Account (1)
- Account does not need to be specified when using windfall
#SBATCH --account=your_group
Number of Nodes
#SBATCH --nodes=1
Core Count or Total Memory (1)
- Only one of these should be specified. See CPUs and Memory for details.
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=10
or
#SBATCH --mem=50gb
Time Limit
#SBATCH --time=01:00:00
The above options are examples for a single node job in the standard partition. Your needs may vary. Use the sections below to determine which options are appropriate for your job.
See the Examples and Explanations section at the end of the page for detailed examples with descriptions.
Allocations and Partitions¶
The partitions, or queues, on the UArizona HPC which determine the priority of your jobs and resources available to them are shown in the table below. With the exception of Windfall, these consume your monthly allocation. See our allocations documentation for more detailed information on each. The syntax to request each of the following is shown below:
Buy-in users must use the --qos directive
If you're a member of a buy-in group and are trying to use your high priority hours, ensure you are including a --qos directive. When this directive is missing, you will recieve the error:
sbatch: error: QOSGrpSubmitJobsLimit
sbatch: error: Batch job submission failed: Job violates accounting/QOS policy (job submit limit, user's size and/or time limits)
Partition |
Request Syntax | Comments |
|---|---|---|
| Standard | |
Request a CPU-only node using standard hours. |
| Standard GPU | |
Request GPU resources using standard hours. See the GPUs section below for details on the gres directive. |
| Windfall | |
Unlimited access. Preemptible. Do not include an --account flag when requesting this partition. |
| Windfall GPU | |
Request GPU resources using windfall. See the GPUs section below for details on the gres directive. Do not include an --account flag when requesting this partition. |
| High Priority | |
Request a CPU-only node with high priority resources. Only available to buy-in groups. |
| High Priority GPU | |
Request GPU resources with high priority hours. Only available to buy-in groups. See the GPUs section below for details on the gres directive. |
| Qualified | |
Available to groups with an activate special project. |
| Qualified GPU | |
Request GPU resources with qualified hours. Available to groups with an activate special project. See the GPUs section below for details on the gres directive. |
Time Limits¶
The syntax for requesting time for your job is HHH:MM:SS or DD-HHH:MM:SS. The maximum amount of time that can be requested is 10 days for a batch job. More details in Job Limits.
#SBATCH --time=HHH:MM:SS
or
#SBATCH --time=DD-HH:MM:SS
CPUs and Memory¶
The number of CPUs (1) for a job can either be specified by the user, or it can be left to the Scheduler to determine. Generally, the user should specify the number of CPUs per node or the total memory per node, but not both. Exactly one of these should always be specified. Both over-specification and under-specification can result in error or unexpected behavior.
- Note that the terms "CPU" and "core" are used interchangably in the context of scheduling jobs, even though they do not have the same meaning in a hardware context.
When the number of CPUs (per node) is specified, the total amount of memory will be determined by the Scheduler using the product of the number of CPUs and the fixed quantity of memory per CPU for the hardware in use. To find the amount of memory per CPU by cluster and node type, see our Compute Resources page.
When the total memory (per node) is specified, the number of CPUs will be determined by the Scheduler using the ratio of the total memory requested to the value of memory per CPU for the given hardware.
Example: Single-Node, CPU-specified, Puma Standard Node
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=30
Total memory is determinded to be 150 GB based on 30 CPUs and 5 GB of memory per CPU on node type. User should not specify total memory.
Example: Single-Node, Memory-specified, Puma Standard Node
#SBATCH --nodes=1
#SBATCH --mem=200gb
The number of CPUs allocated to this job is determined to be 40 based on 200 GB of total memory and 5 GB of memory per CPU. User should not specify number of CPUs.
Memory Per CPU
The value of mem-per-cpu is determined automatically by the Scheduler based on the cluster and node type and generally does not need to be specified by the user. See our Compute Resources page for a lookup table. Since this value varies across clusters and node types, specifying it manually may lead to unexpected behaviors, such as standard jobs being placed on high memory nodes, which both increases wait time for the user and reduces availability of limited resources.
Multi-Node Jobs¶
The same considerations described above apply to multi-node jobs, but the --mem flag determines the total memory per node.
The batch directives to specify CPUs are different for multi-node jobs than for single-node jobs.
To specify the total number of CPUs across nodes, use
#SBATCH --nodes=<n_nodes>
#SBATCH --ntasks=<n_tasks>
To specify the number of CPUs per node, use
#SBATCH --nodes=<n_nodes>
#SBATCH --ntasks-per-node=<n_tasks>
Single vs. Multi-Node Programs
In order for your job to make use of more than one node, it must be able to make use of something like MPI.
If your application is not MPI-enabled, always set --nodes=1
High Memory Nodes¶
Please note that there are only three public and two buy-in high memory nodes on Puma, and only one high memory node on Ocelote, compared to hundreds of standard nodes on each cluster. Wait times are typically much longer on these nodes as compared to standard nodes.
Before requesting a high memory node, please take the time to test your job on standard nodes, increasing the requested memory as necessary. Additional memory can be allocated to standard nodes by increasing the value of the --mem flag or the number of CPUs (see previous sections).
There generally is no benefit to running a job on a high memory node unless it requires more than 470 GB of memory, which is the total amount of memory available on a Puma standard node.
If your job requires less than 470 GB of memory to run, please adjust your batch directives to run on a standard node with an appropriate amount of memory. See CPUs and Memory for details. It may take some testing to find an optimal value.
If your work is MPI-enabled, using multiple standard nodes may be a viable option to increase the total memory. Careful testing is needed to ensure that this will be beneficial to your workflow.
To request a high memory node, you will need the additional flag --constraint=hi_mem. It is recommended to include the exact directives below to avoid unexpected behavior.
| Cluster | Directives |
|---|---|
| Ocelote | |
| Puma | |
Automatic assignment to high memory nodes
If a value of mem-per-cpu is requested that is higher than the value available on a given cluster, the Scheduler will automatically migrate the job to a high memory node, even if you did not explicitly request this.
The most common case of this would be migrating a job from Ocelote, which has 6 GB per CPU, to Puma, which has 5 GB per CPU.
Since the Scheduler is able to detect and assign this value automatically, it is recommended to remove requests for mem-per-cpu and specify either the total memory or number of CPUs. See the CPUs and Memory section for details.
GPUs¶
GPU partitions must be used
GPU jobs will need to use GPU-specific partitions. See the partitions section at the top of this page for details.
GPU options are per node
When using --gres=gpu:N, keep in mind that the total number of GPUs the job is allocated is N per node.
GPUs are limited resources. Before requesting a GPU, please ensure that the program you intend to run is GPU-enabled and properly configured to utilize the GPU.
GPUs are an optional resource that may be requested with the --gres directive. For an overview of the specific GPU resources available on each cluster, see our resources page.
| Cluster | Directive | Target |
|---|---|---|
| Puma | |
Request a single GPU. This will either target one Volta GPU (v100) or one A100 MIG slice, depending on availability. Only one GPU should be selected with this method to avoid being allocated multiple MIG slices. |
|
Target one A100 MIG slice. | |
|
Request N V100 GPUs where 1≤N≤4 |
|
| Ocelote | |
Request N GPUs, where 1≤N≤2. This will target either one or two Pascals (p100s) |
Job Arrays¶
Array jobs in Slurm allow users to submit multiple similar tasks as a single job. Each task within the array can have its own unique input parameters, making it ideal for running batch jobs with varied inputs or executing repetitive tasks efficiently. The flag for submitting array jobs is:
#SBATCH --array=<N>-<M>
<N> and <M> are integers.
For detailed information on job arrays, see our job array tutorial.
Job Dependencies¶
Slurm job dependencies allow users to submit to a series of jobs that depend on each other using the flag and options:
--dependency=<type:jobid[:jobid][,type:jobid[:jobid]]>
For example, say job B depends on the successful completion of job A. Job B can be submitted as a dependency of job A using the following method:
[netid@junonia ~]$ sbatch A.slurm
Submitted batch job 1939000
[netid@junonia ~]$ sbatch --afterok:1939000 B.slurm
B until job A completes. The afterok is the dependency <type>, in this case it ensures that job B runs only if job A completes successfully. The different options for <type> are show below:
| Dependency Type | Meaning |
|---|---|
after |
Job can begin after the specified job(s) have started |
afterany |
Job can begin after the specified job(s) have terminated. Job(s) will start regardless of whether the specified jobs failed or ran successfully |
afterok |
Job can begin after the specified job(s) have completed successfully. If the specified job(s) fail, the dependency will never run. |
afternotok |
Job can begin after the specified job(s) have failed. If the specified job(s) complete successfully, the dependency will never run. |
Output Filenames¶
The default output filename for a slurm job is slurm-<jobid>.out. If desired, this can be customized using the directives
#SBATCH -o output_filename.out
#SBATCH -e output_filename.err
Filenames take patterns that allow for job information substitution. A list of filename patterns is shown below.
| Variable | Meaning | Example Slurm Directive(s) | Sample Output |
|---|---|---|---|
%A |
A job array's main job ID | |
12345.out |
%a |
A job array's index number | |
12345_1.out12345_2.out |
%J |
Job ID plus stepid | |
12345.out |
%j |
Job ID | |
12345.out |
%N |
Hostname of the first compute node allocated to the job | |
r1u11n1.out |
%u |
Username | |
netid.out |
%x |
Job name | |
JobName.out |
Additional Directives¶
Command |
Purpose |
|---|---|
|
Optional: Specify a name for your job. This will not automatically affect the output filename. |
|
Optional: Specify output filename(s). If -e is missing, stdout and stderr will be combined. |
|
Optional: Append your job's output to the specified output filename(s). |
|
Optional: Request email notifications. Beware of mail bombing yourself. |
|
Optional: Specify email address. If this is missing, notifications will go to your UArizona email address by default. |
|
Optional: Export a comma-delimited list of environment variables to a job. |
|
Optional: Export your working environment to your job. This is the default. |
|
Optional: Do not export working environment to your job. |
Examples and Explanations¶
The below examples are complete sections of Slurm directives that will produce valid requests. Other directives can be added (like output files), but they are not strictly necessary to submit a valid request. For simplicity, the Puma cluster is assumed when discussing memory and GPU resources. Note that these examples do not include the shebang #!bin/bash statement, which should be at the top of every Slurm script. Also, note that the order of directives does not matter.
#SBATCH --job-name=hello_world
#SBATCH --account=your_group
#SBATCH --partition=standard
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --time=01:00:00
This example requests one CPU on one node for one hour. Easy!
#SBATCH --job-name=hello_world
#SBATCH --account=your_group
#SBATCH --partition=standard
#SBATCH --nodes=1
#SBATCH --ntasks=10
#SBATCH --time=01:00:00
10 CPUs are now requested. The default value of mem-per-cpu is assumed, therefore giving this job 50 GB of total memory. Specifying this value by including #SBATCH --mem-per-cpu=5gb will not change the behavior of the above request.
The example below will produce an equivalent request as above:
#SBATCH --job-name=hello_world
#SBATCH --account=your_group
#SBATCH --partition=standard
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --mem=50gb
#SBATCH --time=01:00:00
NEW! July 31, 2024 Partitions update
Beginning July 31, GPU jobs must use a GPU partition. See the partitions section at the top of this page for details.
#SBATCH --job-name=hello_world
#SBATCH --account=your_group
#SBATCH --partition=gpu_standard
#SBATCH --nodes=1
#SBATCH --ntasks=10
#SBATCH --time=01:00:00
#SBATCH --gres=gpu:1
Note the gres=gpu:1 option and gpu_standard partition.
When requesting a multi-node job, up to 94 --ntasks-per-node can be requested on Puma. The numbers below are chosen for illustrative purposes and can be replaced with your choice, up to system limitations. It should be noted that there is no advantage to requesting multiple nodes when the total number of CPUs needed is less than or equal to the number of CPUs on one node. The numbers below are just for demonstration purposes.
#SBATCH --job-name=Multi-Node-MPI-Job
#SBATCH --account=your_group
#SBATCH --partition=standard
#SBATCH --ntasks=30
#SBATCH --nodes=3
#SBATCH --ntasks-per-node=10
#SBATCH --time=01:00:00
When requesting a high memory node, include both the --mem-per-cpu and --constraint directives.
#SBATCH --job-name=High-Mem-Job
#SBATCH --account=your_group
#SBATCH --partition=standard
#SBATCH --nodes=1
#SBATCH --ntasks=94
#SBATCH --mem-per-cpu=32gb
#SBATCH --constraint=hi_mem
#SBATCH --time=01:00:00
-
Groups and users are subject to limitations on resource usage. For more information, see job limits. ↩