Job Management Guide
Master job management in S9S with this comprehensive guide covering submission, monitoring, and advanced job operations.
📋 Job Overview
S9S provides a powerful interface for managing SLURM jobs with features that go beyond traditional command-line tools:
- Real-time Monitoring: Live job status updates
- Batch Operations: Manage multiple jobs simultaneously
- Advanced Filtering: Find jobs quickly
- Direct Output Access: View logs without leaving S9S
- Job Templates: Reusable job configurations
- Dependency Management: Visual dependency tracking
🚀 Submitting Jobs
Quick Submit
Press
s
-
Choose Method:
- New job from scratch
- From template
- Copy existing job
- Import script file
-
Configure Resources:
Job Name: my_analysis Partition: compute Nodes: 2 Tasks per Node: 28 Memory: 64GB Time Limit: 24:00:00
-
Set Script:
#!/bin/bash #SBATCH --job-name=my_analysis #SBATCH --output=output_%j.log module load python/3.9 python analyze.py
Template-Based Submission
Use templates for common job types:
# List available templates :templates list # Submit from template :submit template gpu-training # Create new template :template save current-job my-template
Command Line Submit
# Submit with S9S command mode :submit --partition=gpu --nodes=4 --time=2:00:00 myscript.sh # Submit with dependencies :submit --dependency=afterok:12345 --array=1-100 array_job.sh
📊 Monitoring Jobs
Job States
S9S color-codes job states for quick identification:
State | Color | Description |
---|---|---|
PENDING | Yellow | Waiting for resources |
RUNNING | Green | Currently executing |
COMPLETED | Blue | Finished successfully |
FAILED | Red | Exited with error |
CANCELLED | Gray | Cancelled by user/admin |
TIMEOUT | Orange | Exceeded time limit |
SUSPENDED | Purple | Temporarily suspended |
Job Details
Press
Enter
d
- Summary: ID, name, user, submission time
- Resources: Nodes, CPUs, memory, GPUs
- Timing: Start, elapsed, remaining time
- Performance: CPU/memory efficiency
- Output: Stdout/stderr file paths
- Dependencies: Parent/child jobs
Live Output Monitoring
View job output in real-time:
- Select job and press
o
- Choose output type:
- Standard output
- Error output
- Both (split view)
- Options:
- - Follow/tail output
f
- - Search in output
/
- - Save to file
s
- - Exit viewer
Esc
🔧 Job Operations
Single Job Actions
Key | Action | Description |
---|---|---|
| Cancel | Cancel job (with confirmation) |
| Force Cancel | Cancel without confirmation |
| Hold | Prevent job from starting |
| Release | Release held job |
| Requeue | Resubmit failed job |
| Priority | Modify job priority |
| Edit | Modify pending job |
| Move | Move to different partition |
Batch Operations
Select multiple jobs with
Space
b
-
Selection Methods:
- Manual: on each job
Space
- All visible:
V
- By filter: then
/state:PENDING
V
- By pattern:
:select pattern "analysis_*"
- Manual:
-
Batch Actions:
- Cancel selected
- Hold/Release selected
- Change priority
- Move partition
- Add dependency
- Export data
Advanced Operations
Job Arrays
Manage array jobs efficiently:
# View array summary :array-summary 12345 # Expand all array tasks :expand-array 12345 # Cancel specific tasks :cancel 12345_[1-10,20,30-40] # Hold array subset :hold 12345_[50-100]
Dependencies
Visualize and manage job dependencies:
# View dependency tree :deps tree 12345 # Add dependency :deps add 12346 --after 12345 # Remove dependency :deps remove 12346 # View dependency graph :deps graph --format=dot | dot -Tpng > deps.png
🔍 Advanced Filtering
Filter Syntax
S9S supports powerful job filtering:
# Basic filters /RUNNING # Running jobs /gpu # Jobs with 'gpu' in name /user:alice # Alice's jobs # State filters /state:PENDING # Pending jobs /state:RUNNING,COMPLETED # Multiple states /state:!FAILED # Not failed # Resource filters /nodes:>4 # More than 4 nodes /memory:>=32GB # 32GB or more memory /gpus:>0 # GPU jobs # Time filters /runtime:>1h # Running over 1 hour /runtime:30m-2h # Between 30min and 2h /submitted:<1d # Submitted within 1 day /started:today # Started today # Complex filters /user:bob state:RUNNING partition:gpu # Bob's GPU jobs /name:~"analysis.*2023" nodes:>10 # Regex + resource
Saved Filters
Save frequently used filters:
# Save current filter :filter save gpu-queue "/partition:gpu state:PENDING" # Load saved filter :filter load gpu-queue # List saved filters :filter list # Delete filter :filter delete old-filter
📈 Job Performance
Efficiency Metrics
S9S calculates job efficiency:
- CPU Efficiency: Actual vs allocated CPU usage
- Memory Efficiency: Peak vs allocated memory
- GPU Utilization: GPU usage percentage
- I/O Performance: Read/write statistics
View metrics:
- Select job and press
i
- Navigate to "Performance" tab
- View graphs and statistics
Performance Alerts
Set up alerts for inefficient jobs:
# In config.yaml alerts: lowEfficiency: threshold: 0.5 metric: cpu action: notify highMemory: threshold: 90% metric: memory action: email
📝 Job Templates
Creating Templates
Save job configurations as templates:
- Configure job in submission wizard
- Instead of submitting, press
Ctrl+S
- Name your template
- Template saved to
~/.s9s/templates/
Using Templates
# List templates :template list # View template :template show gpu-analysis # Submit from template :template submit gpu-analysis # Edit template :template edit gpu-analysis # Share template :template export gpu-analysis > gpu-template.yaml
Template Variables
Templates support variables:
# ~/.s9s/templates/parametric.yaml name: "parametric_${INDEX}" script: | #!/bin/bash #SBATCH --array=1-${ARRAY_SIZE} #SBATCH --mem=${MEMORY}GB python process.py --index=$SLURM_ARRAY_TASK_ID variables: ARRAY_SIZE: 100 MEMORY: 32
🔄 Job Workflows
Job Chains
Create dependent job workflows:
# Submit job chain :chain submit \ --job1 preprocess.sh \ --job2 analyze.sh --after job1 \ --job3 cleanup.sh --after job2 # View chain status :chain status my-workflow # Cancel entire chain :chain cancel my-workflow
Recurring Jobs
Set up recurring job submissions:
# Daily job :schedule add daily-backup \ --script backup.sh \ --time "02:00" \ --repeat daily # Weekly analysis :schedule add weekly-report \ --script report.sh \ --day monday \ --time "09:00" \ --repeat weekly
📊 Job Reporting
Export Job Data
Export job information for analysis:
# Export current view :export csv jobs.csv # Export with filters :export json --filter "state:COMPLETED user:${USER}" my-jobs.json # Export with specific columns :export markdown --columns "JobID,Name,State,Runtime,Efficiency" report.md
Generate Reports
Create job reports:
# User summary report :report user-summary --period month # Efficiency report :report efficiency --threshold 0.7 # Failed jobs analysis :report failures --period week --format html > failures.html
💡 Tips & Best Practices
Efficiency Tips
- Use templates for repetitive jobs
- Set up filters for your common queries
- Monitor efficiency to optimize resource requests
- Use batch operations for multiple similar jobs
- Enable notifications for long-running jobs
Common Workflows
Debug Failed Jobs
/state:FAILED # Filter failed jobs Enter # View job details o # Check output/errors R # Requeue if needed
Monitor GPU Usage
/partition:gpu state:RUNNING # Filter GPU jobs i # View job info Tab → Performance # Check GPU utilization
Bulk Cancel User Jobs
/user:username # Filter by user V # Select all visible b # Batch operations c # Cancel selected
🆘 Troubleshooting
Common Issues
Job Stuck in PENDING
- Check reason code in job details
- View partition limits
- Check dependencies
- Verify resource availability
Low Efficiency
- Review resource requests
- Check for I/O bottlenecks
- Verify correct partition
- Consider job profiling
Output Not Found
- Verify output paths in job script
- Check working directory
- Ensure write permissions
- Look for redirected output
🚀 Next Steps
- Learn about Batch Operations
- Explore Performance Monitoring
- Set up Job Notifications
- Master Advanced Filtering