Monitoring Jobs¶
Note
Avoid running multiple instances of watch squeue or watch sqs. This can overload the scheduler, which is a shared system resource. If you must use watch, use watch -n 60 and stop the process when finished.
Using squeue¶
The squeue command provides real-time job queue information directly from the Slurm scheduler. It is helpful for checking the current state of jobs, such as PENDING, RUNNING, or COMPLETED.
squeue --me # Shows your jobs
squeue -u $USER # Equivalent to --me
squeue --me -t R # Only running jobs
squeue --me -t PD # Only pending jobs
squeue -j 1234,1235 # Filter by job IDs
To show jobs by account:
squeue -A your_project_name
To view job steps:
squeue --steps 1001.0
Using sacct¶
The sacct command retrieves accounting information about active and completed jobs.
Basic usage:
sacct
Customize output by specifying fields:
sacct --format=JobID,JobName,State,Start,Elapsed
Filter jobs by date:
sacct -S 2026-04-01 -E 2026-04-26
Display only failed jobs:
sacct -X --format=User,JobName,State -s F --start=2026-04-01 --end=now
¶
sacct -X --format=User,JobName,State -s F --start=2026-04-01 --end=now
Using sstat¶
Use sstat to report resource usage for jobs that are currently running. sstat queries job steps, not the top-level job ID — for batch scripts, append .batch:
sstat -j 123456.batch -o JobID,MaxRSS
Email Notifications¶
To receive notifications when your job begins, ends, or fails, add the following directives to your Slurm job script:
#SBATCH --mail-type=begin,end,fail
#SBATCH --mail-user=your@email.com
Modifying or Canceling Jobs¶
To cancel a job:
scancel 123456
To cancel multiple jobs:
scancel 123456 123457
To cancel all jobs submitted by your user:
scancel -u $USER
To update a job’s time limit:
scontrol update jobid=123456 timelimit=02:00:00
Holding, Releasing, and Requeuing Jobs¶
Place a job on hold (prevent scheduling):
scontrol hold 123456
Release a held job:
scontrol release 123456
Requeue a job (e.g., after failure or timeout):
scontrol requeue 123456