Skip to main content

Job Management Guide

Master job management in S9S with this comprehensive guide covering submission, monitoring, and advanced job operations.

Job Overview

S9S provides a powerful interface for managing SLURM jobs with features that go beyond traditional command-line tools:

  • Real-time Monitoring: Live job status updates
  • Batch Operations: Manage multiple jobs simultaneously
  • Advanced Filtering: Find jobs quickly
  • Direct Output Access: View logs without leaving S9S
  • Job Templates: Reusable job configurations
  • Dependency Management: Visual dependency tracking

Submitting Jobs

Quick Submit

Press s in Jobs view to open the submission wizard:

  1. Choose Method:

    • New job from scratch (Custom Job)
    • From template (select a pre-configured template)
  2. Configure Resources:

    Job Name: my_analysis
    Partition: compute
    Nodes: 2
    Tasks per Node: 28
    Memory: 64GB
    Time Limit: 24:00:00
  3. Set Script:

    #!/bin/bash
    
    module load python/3.9
    python analyze.py

Job Submission Workflow

Job Submission Demo

Job submission wizard with step-by-step configuration

The submission process guides you through all necessary options with helpful defaults and validation.

Submission Wizard Fields

The wizard supports 86 sbatch fields across the full SLURM OpenAPI spec. Fields are organized into three visibility tiers so the form stays manageable while still exposing every option when needed.

Always visible -- these 7 core fields appear on every new job form:

Config KeyJSON Keysbatch FlagDescription
namename--job-nameJob name
scriptscript(script body)Batch script content
partitionpartition--partitionTarget partition
timeLimittime_limit--timeWall-clock time limit (HH:MM:SS or D-HH:MM:SS)
nodesnodes--nodesNumber of nodes
cpuscpus--cpus-per-taskCPUs per task
memorymemory--memMemory per node (e.g., 4G, 1024M)

Visible by default -- these 8 fields are shown on the form unless explicitly hidden:

Config KeyJSON Keysbatch FlagDescription
gpusgpus--gres=gpu:NNumber of GPUs
qosqos--qosQuality of service
accountaccount--accountCharge account
workingDirworking_directory--chdirWorking directory
outputFileoutput_file--outputStdout file path
errorFileerror_file--errorStderr file path
emailNotifyemail_notify--mail-typeEnable email notifications
emailemail--mail-userNotification email address

Hidden by default -- these 71 fields become visible when a template sets a value for them, when they are removed from the hiddenFields list in config, or when a per-template hiddenFields override brings them into view.

Job arrays & dependencies:

Config KeyJSON Keysbatch FlagDescription
arraySpecarray--arrayArray job index spec (e.g., 1-100%10)
dependenciesdependencies--dependencyJob dependencies (list of job IDs, submitted as afterok:id1:id2)

Resource controls:

Config KeyJSON Keysbatch FlagDescription
exclusiveexclusive--exclusiveExclusive node access
requeuerequeue--requeueRequeue on failure
gresgres--gresGeneric resources (e.g., gpu:a100:2)
constraintsconstraints--constraintRequired node features
ntasksntasks--ntasksTotal number of tasks
ntasksPerNodentasks_per_node--ntasks-per-nodeTasks per node
memoryPerCPUmemory_per_cpu--mem-per-cpuMemory per CPU (e.g., 4G)
minimumCPUsminimum_cpus(API only)Minimum total CPUs
minimumCPUsPerNodeminimum_cpus_per_node--mincpusMinimum CPUs per node
maximumNodesmaximum_nodes--nodes (max)Maximum node count
maximumCPUsmaximum_cpus(API only)Maximum CPU count
tmpDiskPerNodetmp_disk_per_node--tmpTemporary disk per node (MB)
overcommitovercommit--overcommitOvercommit resources
contiguouscontiguous--contiguousRequire contiguous nodes

Scheduling:

Config KeyJSON Keysbatch FlagDescription
holdhold--holdSubmit in held state
prioritypriority--priorityJob priority
nicenice--nicePriority adjustment
beginTimebegin_time--beginDeferred start (see Begin Time Formats below)
deadlinedeadline--deadlineLatest acceptable start time
immediateimmediate--immediateFail if resources not available now
timeMinimumtime_minimum--time-minMinimum time for backfill scheduling

Placement & topology:

Config KeyJSON Keysbatch FlagDescription
distributiondistribution--distributionTask distribution (block, cyclic, etc.)
threadsPerCorethreads_per_core--threads-per-coreThreads per core
tasksPerCoretasks_per_core--ntasks-per-coreTasks per core
tasksPerSockettasks_per_socket--ntasks-per-socketTasks per socket
socketsPerNodesockets_per_node--sockets-per-nodeSockets per node
cpuBindingcpu_binding--cpu-bindCPU binding method
cpuBindingFlagscpu_binding_flags--cpu-bind (flags)CPU binding flags (VERBOSE, etc.)
memoryBindingmemory_binding--mem-bindMemory NUMA binding
memoryBindingTypememory_binding_type--mem-bind (type)Memory binding type (LOCAL, RANK)
requiredNodesrequired_nodes--nodelistRequired specific nodes
excludeNodesexclude_nodes--excludeExcluded nodes

TRES (Trackable Resources):

Config KeyJSON Keysbatch FlagDescription
cpusPerTREScpus_per_tres--cpus-per-gpuCPUs per GPU/TRES
memoryPerTRESmemory_per_tres--mem-per-gpuMemory per GPU/TRES
ntasksPerTRESntasks_per_tres--ntasks-per-gpuTasks per GPU/TRES
tresPerTasktres_per_task--tres-per-taskTRES per task
tresPerSockettres_per_socket--tres-per-socketTRES per socket
tresPerJobtres_per_job--tres-per-jobTRES per job
tresBindtres_bind--tres-bindTRES binding (e.g., gres/gpu:closest)
tresFreqtres_freq--tres-freqTRES frequency control

Signals & notifications:

Config KeyJSON Keysbatch FlagDescription
signalsignal--signalPre-termination signal (e.g., B:USR1@300)
killOnNodeFailkill_on_node_fail--no-killKill job on node failure
waitAllNodeswait_all_nodes--wait-all-nodesWait for all nodes to boot

Accounting & admin:

Config KeyJSON Keysbatch FlagDescription
reservationreservation--reservationReservation name
licenseslicenses--licensesRequired licenses
wckeywckey--wckeyWorkload characterization key
commentcomment--commentJob comment
preferprefer--preferPreferred (not required) features

I/O & environment:

Config KeyJSON Keysbatch FlagDescription
standardInputstandard_input--inputStdin file path
openModeopen_mode--open-modeOutput file mode (append or truncate)
containercontainer--containerOCI container path

Advanced:

Config KeyJSON Keysbatch FlagDescription
cpuFrequencycpu_frequency--cpu-freqCPU frequency (low, medium, high, or KHz)
networknetwork--networkNetwork specs
x11x11--x11X11 forwarding (batch, first, last, all)
burstBufferburst_buffer--bbBurst buffer specification
batchFeaturesbatch_features--batchBatch node features
coreSpecificationcore_specification--core-specReserved system cores
threadSpecificationthread_specification--thread-specReserved system threads
argvargv(script arguments)Script arguments (space-separated)
flagsflags--spread-job, etc.Job flags (comma-separated)
profileprofile--profileProfiling (ENERGY, NETWORK, TASK)

Cluster federation:

Config KeyJSON Keysbatch FlagDescription
requiredSwitchesrequired_switches--switchesRequired network switch count
waitForSwitchwait_for_switch--switches (timeout)Switch wait timeout (seconds)
clusterConstraintcluster_constraint--cluster-constraintFederation cluster constraint
clustersclusters--clustersTarget clusters (federation)

Begin Time Formats

The beginTime field accepts all formats supported by sbatch --begin:

FormatExampleDescription
Named timenow, today, tomorrowCurrent time, midnight today, midnight tomorrow
Named hourmidnight, noon, teatime00:00, 12:00, 16:00 (next occurrence)
Relativenow+1hour, now+30minutesOffset from current time
Relative (seconds)now+3600Bare number = seconds
ISO date2024-12-31Midnight on date
ISO datetime2024-12-31T14:30Specific date and time
US date12/31/24, 123124US date format
Time of day16:00, 4:00PMNext occurrence of time

Named times: midnight (00:00), noon (12:00), elevenses (11:00), fika (15:00), teatime (16:00).

Template-Based Submission

To submit a job from a template, press s to open the submission wizard, then select "From template" and pick a template from the list. The selected template pre-fills the form with its configured defaults and controls which fields are visible.

You can also list available templates from the shell:

s9s templates list

Config-Driven Customization

All submission defaults and field visibility can be configured under views.jobs.submission in your config file (~/.s9s/config.yaml):

views:
  jobs:
    submission:
      # Global defaults applied to every new job
      formDefaults:
        partition: "compute"
        timeLimit: "04:00:00"
        nodes: 1
        cpus: 4
        memory: "8G"
        workingDir: "/scratch/%u"  # %u = username (SLURM substitution)
        outputFile: "slurm_%j.out"
        errorFile: "slurm_%j.err"

      # Fields to hide globally from the form
      hiddenFields:
        - arraySpec
        - exclusive
        - requeue

      # Restrict dropdown values (filters cluster-fetched values)
      fieldOptions:
        partition: ["compute", "gpu", "highmem"]
        qos: ["normal", "high"]
        account: ["research-a", "research-b"]

      # Control which template sources are loaded (default: all three)
      # Options: "builtin", "config", "saved"
      templateSources: ["builtin", "config", "saved"]

      # Define custom config templates (see Job Templates section below)
      templates:
        - name: "GPU Training Job"
          description: "PyTorch training on GPU partition"
          defaults:
            partition: "gpu"
            timeLimit: "24:00:00"
            cpus: 8
            memory: "32G"
            gpus: 2
            script: |
              #!/bin/bash
              module load cuda pytorch
              python train.py
          hiddenFields: ["arraySpec"]

Monitoring Jobs

Job States

S9S color-codes job states for quick identification:

StateColorDescription
PENDINGYellowWaiting for resources
RUNNINGGreenCurrently executing
COMPLETEDCyanFinished successfully
FAILEDRedExited with error
CANCELLEDGrayCancelled by user/admin (SLURM uses British spelling)
TIMEOUTWhiteExceeded time limit
SUSPENDEDOrangeTemporarily suspended

Job Details

Press Enter on any job to view details:

  • Summary: ID, name, user, submission time
  • Resources: Nodes, CPUs, memory, GPUs
  • Timing: Start, elapsed, remaining time
  • Performance: CPU/memory efficiency
  • Output: Stdout/stderr file paths
  • Dependencies: Parent/child jobs

Press d to view job dependencies (not details).

Live Output Monitoring

View job output in real-time:

  1. Select job and press o
  2. Choose output type:
    • Standard output (stdout)
    • Standard error (stderr)
  3. Options:
    • f - Follow/tail output
    • s - Switch between stdout/stderr
    • Esc - Exit viewer

For more details, see the Jobs View Guide.

Job Operations

Single Job Actions

KeyActionDescription
c/CCancelCancel job (with confirmation)
HHoldPrevent job from starting
rReleaseRelease held job
RRefreshRefresh the jobs list
:requeue JOBIDRequeueResubmit failed job (use command mode)
d/DDependenciesView job dependencies
p/PToggle PendingToggle pending state filter
e/EExportOpen export dialog
m/MAuto-refreshToggle auto-refresh

Batch Operations

Select multiple jobs with Space, then press b:

  1. Selection Methods:

    • Manual: Space on each job
    • Toggle multi-select: v or V
    • By filter: /PENDING then select visible jobs
  2. Batch Actions:

    • Cancel selected
    • Hold/Release selected
    • Requeue selected
    • Delete selected
    • Set Priority
    • Export output

Advanced Operations

Job Arrays

Array jobs are created by setting the arraySpec field in the submission wizard (for example, 1-100%10). Once submitted, individual array tasks appear in the jobs list and can be managed with standard job operations (cancel, hold, release) using the task's full job ID.

Dependencies

Job dependencies are set in the submission wizard via the dependencies field. Enter a comma-separated list of job IDs; S9S submits them as afterok:id1:id2 automatically. Dependency information is displayed in the job details view.

See #115 for planned command-mode enhancements to array and dependency management.

Advanced Filtering

Filter Syntax

S9S supports two filtering modes:

Quick Filter (/) -- plain text search across all visible columns. Type /gpu to find items containing "gpu". The only special prefix is p: for partition filtering.

Global Search (Ctrl+F) -- opens cross-resource search (available in all data views). The advanced filter bar supports field-specific field=value syntax with operators:

# Advanced filter examples
state=RUNNING                           # Running jobs
user=alice                              # Alice's jobs
state=PENDING user=bob                  # Bob's pending jobs (AND logic)
name~analysis                           # Jobs containing "analysis"
name=~"analysis.*2023"                  # Regex match
memory>4G cpus>=8                       # Resource comparisons
state!=FAILED                           # Not failed
state in (RUNNING,PENDING)              # In list

Saved Filters

Saved filters are not yet implemented. Use the / quick filter for on-the-fly filtering. See #115 for planned saved-filter support.

Job Performance

Efficiency Metrics

S9S calculates job efficiency:

  • CPU Efficiency: Actual vs allocated CPU usage
  • Memory Efficiency: Peak vs allocated memory
  • GPU Utilization: GPU usage percentage
  • I/O Performance: Read/write statistics

View metrics by selecting a job and pressing Enter to see the job details view, which includes efficiency information when available.

Job Templates

Template System Overview

S9S uses a three-tier merge system to assemble the list of available templates. When two or more sources define a template with the same name, the higher-priority source wins.

PrioritySourceLocation
1 (highest)User-saved templates~/.s9s/templates/*.json
2Config YAML templatesviews.jobs.submission.templates in config
3 (lowest)Built-in templatesHardcoded in S9S

Built-in Templates

S9S ships with 8 built-in templates covering common job patterns:

TemplateDescription
Basic Batch JobSimple single-node batch job
MPI Parallel JobParallel job using MPI across multiple nodes
GPU JobJob requiring GPU resources
Array JobArray job for processing multiple similar tasks
Interactive JobInteractive session for development and testing
Long-Running JobExtended wall-time job
High Memory JobJob requesting large memory allocation
Development/Debug JobShort debug session with verbose output

Config YAML Templates

Define custom templates in your config file under views.jobs.submission.templates. Each template can set default values for any form field and optionally hide irrelevant fields:

views:
  jobs:
    submission:
      templates:
        - name: "GPU Training Job"
          description: "PyTorch training on GPU partition"
          defaults:
            partition: "gpu"
            timeLimit: "24:00:00"
            cpus: 8
            memory: "32G"
            gpus: 2
            script: |
              #!/bin/bash
              module load cuda pytorch
              python train.py
          hiddenFields: ["arraySpec"]

        - name: "Genomics Pipeline"
          description: "High-memory genomics analysis"
          defaults:
            partition: "highmem"
            timeLimit: "48:00:00"
            memory: "256G"
            cpus: 32
          hiddenFields: ["gpus", "arraySpec"]

User-Saved Templates

User-saved templates are stored as individual JSON files in ~/.s9s/templates/ and have the highest priority in the merge order.

Saving from the Wizard

After configuring a job in the submission wizard, use the "Save as Template" flow to save the current form state as a new template in ~/.s9s/templates/.

Template JSON Format

Each saved template is a JSON file with the following structure:

{
  "name": "My Custom Template",
  "description": "Description of this template",
  "job_submission": {
    "name": "my_job",
    "partition": "compute",
    "time_limit": "04:00:00",
    "nodes": 2,
    "cpus": 8,
    "memory": "16G",
    "working_directory": "/scratch/%u",
    "output_file": "job_%j.out",
    "error_file": "job_%j.err",
    "script": "#!/bin/bash\nmodule load python\npython run.py"
  }
}

Naming conventions: Saved JSON templates use snake_case field names (matching Go struct JSON tags): time_limit, output_file, error_file, working_directory. Config YAML templates use camelCase: timeLimit, outputFile, errorFile, workingDir. Using the wrong convention will silently ignore the field. Script arguments (argv) are split on whitespace in both formats — quoted arguments with spaces are not supported.

Job Script Preview

Before submitting, press Preview to see the complete sbatch script that will be generated from your form values. The preview shows all #SBATCH directives derived from the form fields followed by your script body.

KeyAction
ESCClose preview
Ctrl+YCopy script to clipboard (via OSC 52)

The clipboard copy produces a clean plain-text script with no color formatting, ready to paste into a file or terminal. OSC 52 clipboard support works in most modern terminals (iTerm2, kitty, tmux, Windows Terminal, Alacritty).

CLI Commands

Manage templates from the command line:

# List all templates from all sources with source indicator (builtin/config/saved)
s9s templates list

# Export all built-in and config templates to ~/.s9s/templates/ as editable JSON
s9s templates export

# Export a single template by name
s9s templates export "GPU Job"

# Overwrite existing files during export
s9s templates export --force

# Export to a custom directory
s9s templates export --dir /path/to/templates

Template Workflow

A typical workflow for customizing templates:

  1. Export the built-ins to get editable copies:

    s9s templates export

    This writes all 8 built-in templates (plus any config templates) to ~/.s9s/templates/ as JSON files. Existing files are skipped unless --force is used.

  2. Edit the JSON files in ~/.s9s/templates/ to match your environment (change partitions, modules, default resources, etc.). The exported files use the same format that JobTemplateManager loads — changes take effect on the next wizard open without restarting s9s.

  3. Verify your templates are loaded:

    s9s templates list

    Edited templates show as saved source and override any built-in with the same name.

  4. Optionally restrict sources so only your edited templates appear in the wizard:

    views:
      jobs:
        submission:
          templateSources: ["saved"]
  5. Use templates in the wizard — press s to open the submission wizard, then pick from the template selector. Your custom templates appear with the values you set.

Controlling Template Sources

By default all three sources are loaded. Use the templateSources config option to control which sources appear:

views:
  jobs:
    submission:
      # Show only user-saved and config templates (hide built-ins)
      templateSources: ["config", "saved"]

Valid values: "builtin", "config", "saved".

Job Workflows

Job Chains

To create a chain of dependent jobs, submit each job with its dependencies set in the submission wizard's dependencies field. For example, submit a preprocessing job first, then submit an analysis job with the preprocessing job's ID in the dependencies field, and so on. S9S automatically formats these as afterok dependencies.

Job chains and recurring job scheduling via command mode are not yet available. See #115 for planned workflow enhancements.

Job Reporting

Export Job Data

Export the current view by pressing e (depending on the view). This exports the visible data to a file. Command-mode export and report generation are not yet available. See #115 for planned reporting enhancements.

Tips & Best Practices

Efficiency Tips

  1. Use templates for repetitive jobs
  2. Set up filters for your common queries
  3. Monitor efficiency to optimize resource requests
  4. Use batch operations for multiple similar jobs
  5. Enable notifications for long-running jobs

Common Workflows

Debug Failed Jobs

# Use p/P to toggle pending filter, or / for text search
/FAILED                # Quick filter for failed jobs
Enter                  # View job details
o                      # Check output/errors
:requeue JOBID         # Requeue if needed (command mode)

Monitor GPU Jobs

/gpu                           # Quick filter for GPU-related jobs
Enter                          # View job details

Bulk Cancel User Jobs

/username                     # Filter by user text
V                            # Toggle multi-select mode
Space                        # Select individual jobs
b                           # Batch operations
c                          # Cancel selected

Troubleshooting

Common Issues

Job Stuck in PENDING

  • Check reason code in job details
  • View partition limits
  • Check dependencies
  • Verify resource availability

Low Efficiency

  • Review resource requests
  • Check for I/O bottlenecks
  • Verify correct partition
  • Consider job profiling

Output Not Found

  • Verify output paths in job script
  • Check working directory
  • Ensure write permissions
  • Look for redirected output

Next Steps