Running jobs¶
Note
This page covers the specifics of the IQ HPC Platform. If you are not
familiar with job submission and management commands (such as sbatch
,
salloc
, squeue
), read Running jobs from the Alliance
technical documentation first or check our page about learning HPC.
Login nodes¶
Use the login node (ip09
) to prepare your jobs. It is however forbidden to
run jobs directly on this node! Cluster login nodes do not have the necessary
computing power to run jobs. In addition, running a job on a login node can slow
it down considerably, which negatively impacts all connected researchers. All
jobs must be submitted to the scheduler using the appropriate commands:
sbatch
, salloc
, srun
.
Input/output¶
Use the storage at /net/nfs-iq/data
to read and write research data in your
jobs. This location offers better performance than your home directory.
Public nodes¶
Les tâches doivent être soumises à la partition c-iq
, qui est aussi la
partition par défaut. Aucune option n’est donc nécessaire, mais la partition
peut néanmoins être indiquée explicitement avec l’option --partition
ou sa
forme courte -p
si désiré. Par exemple, dans un script de tâche :
Jobs must be submitted to the c-iq
partition, which is the default. The
partition can nevertheless be explicited using the --partition
option or its
short form -p
. For example, in a job script:
#!/bin/bash
#SBATCH --job-name=my-job
#SBATCH --partition=c-iq
...
Maximum job duration is seven days.
GPU jobs¶
The IQ HPC Platform offers one GPU compute node with two Nvidia A40 GPUs, which
can be requested with the --gpus-per-node
, --gpus-per-task
, and
--gres
options.
For example, to use one GPU in an interactive job:
[alice@ip09 ~]$ salloc --gres=gpu
To use both GPUs in a job script:
#!/bin/bash
#SBATCH --job-name=my-job
#SBATCH --partition=c-iq
#SBATCH --gpus-per-node=nvidia_a40:2
...
Contributed nodes¶
To run a job on one or more contributed nodes to which you have access, request
the corresponding partition with -p
--partition
. Refer to the
contributed nodes table. For instance, use
--partition=c-apc
to submit a job to David Sénéchal’s APC nodes. Maximum job
duration varies depending on the partition and is noted in the node table.
Job management¶
The squeue
command lists all jobs in the scheduler, including all users’.
Use sq
to list only your own jobs. (This last command is also available on
Alliance clusters.)
Monitoring active jobs¶
When one of your jobs starts, it is important to verify that it uses allocated resources properly. For instance, if a job has access to 4 CPU cores and 80G of memory, is it really using these 4 cores at 100% and is memory usage in that order of magnitude?
To verify, use ssh
to connect to a compute node allocated to your job.
There, run htop
, which gives an overview of CPU and memory usage. In the
following example, alice
uses the output of sq
to identify node
cp1433
before connecting to it. htop
shows 4 processes running at 100%
and belonging to Alice, which matches the 4 CPUs allocated to her job.
[alice@ip09 ~]$ sq
JOBID USER ACCOUNT NAME ST TIME_LEFT NODES CPUS GRES MIN_MEM NODELIST (REASON)
5623630 alice def-alice md-job.sh R 14:56 1 4 (null) 256M cp1433 (None)
[alice@ip09 ~]$ ssh cp1433
Last login: Wed Aug 21 11:16:34 2024 from ip09.m
[alice@cp1433-mp2 ~]$ htop
0[||||||||100.0%] 8[ 0.0%] 16[ 0.0%] 24[ 0.0%]
1[||||||||100.0%] 9[ 0.0%] 17[| 0.7%] 25[ 0.0%]
2[||||||||100.0%] 10[ 0.0%] 18[ 0.0%] 26[ 0.0%]
3[||||||||100.0%] 11[ 0.0%] 19[ 0.0%] 27[ 0.0%]
4[ 0.0%] 12[ 0.0%] 20[ 0.0%] 28[ 0.0%]
5[ 0.0%] 13[ 0.0%] 21[ 0.0%] 29[ 0.0%]
6[ 0.0%] 14[ 0.0%] 22[ 0.0%] 30[ 0.0%]
7[ 0.0%] 15[ 0.0%] 23[| 0.7%] 31[ 0.0%]
Mem[||| 6.82G/252G] Tasks: 63, 174 thr; 5 running
Swp[ 0K/0K] Load average: 2.40 0.71 1.22
Uptime: 1 day, 20:53:58
PID USER PRI NI VIRT RES SHR S CPU%▽MEM% TIME+ Command
35160 alice 20 0 457M 97680 19588 R 99. 0.0 0:51.67 /cvmfs/soft.computecanada.
35161 alice 20 0 454M 96376 19248 R 99. 0.0 0:51.93 /cvmfs/soft.computecanada.
35162 alice 20 0 454M 95832 19248 R 99. 0.0 0:51.83 /cvmfs/soft.computecanada.
35163 alice 20 0 446M 93644 19252 R 99.3 0.0 0:51.82 /cvmfs/soft.computecanada.
35449 alice 20 0 58960 4812 3044 R 0.7 0.0 0:00.08 htop
1 root 20 0 122M 4116 2636 S 0.0 0.0 0:47.60 /usr/lib/systemd/systemd -
1041 root 20 0 39060 8500 8172 S 0.0 0.0 0:01.65 /usr/lib/systemd/systemd-j
1074 root 20 0 45472 1840 1352 S 0.0 0.0 0:11.67 /usr/lib/systemd/systemd-u
1318 root 20 0 48920 1328 1012 S 0.0 0.0 0:00.00 /usr/sbin/rdma-ndd --syste
1393 root 16 -4 55532 860 456 S 0.0 0.0 0:00.37 /sbin/auditd
1394 root 16 -4 55532 860 456 S 0.0 0.0 0:00.00 /sbin/auditd
1395 root 12 -8 84556 888 740 S 0.0 0.0 0:00.39 /sbin/audispd
F1Help F2Setup F3SearchF4FilterF5Tree F6SortByF7Nice -F8Nice +F9Kill F10Quit
GPU jobs¶
For GPU jobs, you must also check that they use the allocated GPU(s). To do so,
connect to the compute node and use the nvidia-smi
command, which lists GPUs
and the programs using them. For example:
[alice@ip09 ~]$ ssh cp3705
Last login: Wed Aug 21 13:47:44 2024 from ip09.m
[alice@cp3705-mp2 ~]$ nvidia-smi
Wed Aug 21 13:52:41 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A40 Off | 00000000:65:00.0 Off | 0 |
| 0% 30C P0 81W / 300W | 370MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA A40 Off | 00000000:CA:00.0 Off | 0 |
| 0% 29C P0 70W / 300W | 276MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 14734 C gmx_mpi 362MiB |
| 1 N/A N/A 14734 C gmx_mpi 268MiB |
+-----------------------------------------------------------------------------------------+
We notice that process gmx_mpi
(id 14734) is using both GPUs.
Statistics for finished jobs¶
The seff
command shows statistics about finished jobs, including their CPU
and memory efficiency. For example:
[alice@ip15-mp2 ~]$ seff 5623631
Job ID: 5623631
Cluster: mp2
User/Group: alice/alice
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 4
CPU Utilized: 01:00:09
CPU Efficiency: 99.59% of 01:00:24 core-walltime
Job Wall-clock time: 00:15:06
Memory Utilized: 353.91 MB (estimated maximum)
Memory Efficiency: 34.56% of 1.00 GB (256.00 MB/core)
Typically, CPU efficiency should be close to 100%. A lower efficiency indicates that CPU time is wasted, possibly because the job is not using all allocated resources. If the efficiency of one of your jobs is under 70%, you should not submit other similar jobs before fixing this problem.
Memory efficiency should be at least 50%. If one of your jobs is under this treshold, reduce the amount of requested memory for similar jobs. (If you ask for the default amount of memory, 256M per CPU core, ignore memory efficiency since your absolute usage is very low anyway.)
By monitoring job efficiency, you not only ensure that they run faster: you also allow a greater number of jobs to run simultaneously, which reduces the wait time for all researchers.