Performance View

The Performance view provides real-time cluster-wide metrics and utilization statistics, giving you a high-level overview of your SLURM cluster's health and resource usage.

Performance Demo

Performance view showing cluster-wide job, node, and resource metrics

Overview

The Performance view displays three main metric categories:

Jobs: Total job counts and distribution
Nodes: Node availability and status
Resources: Cluster-wide CPU and Memory utilization

This view is designed for quick cluster health assessment and capacity planning.

Access

Press 0 or navigate to "Performance" from the view switcher.

Display Sections

Jobs Metrics

Shows cluster-wide job statistics:

Total: All jobs in the system
Running: Currently executing jobs (green)
Pending: Jobs waiting in queue (blue)

Use Cases:

Monitor queue depth
Identify bottlenecks (high pending count)
Track overall cluster load

Nodes Metrics

Shows node availability across the cluster:

Total: All configured nodes
Active: Nodes running jobs (green)
Idle: Available nodes with no jobs (blue)
Down: Offline or unavailable nodes (red)

Use Cases:

Identify hardware issues (down nodes)
Check capacity (idle nodes available)
Monitor resource utilization

Resources Metrics

Shows aggregate cluster utilization:

CPU: Cluster-wide CPU usage percentage
Memory: Cluster-wide memory usage percentage
Visual bars with color-coded thresholds:
- 🟢 Green: 0-75% (healthy)
- 🟡 Yellow: 75-90% (high)
- 🔴 Red: 90-100% (critical)

Use Cases:

Capacity planning
Identify resource saturation
Performance trending

Keyboard Shortcuts

Key	Action
`F5`	Manual refresh
`F6`	Pause/resume global auto-refresh

Auto-Refresh

The Performance view refreshes on the global auto-refresh ticker, controlled by refreshRate in config (default 10 seconds). All views share this single cadence.

Manual refresh: Press F5 to update immediately
Pause/resume globally: Press F6 — applies to every view, not just Performance
Change cadence: Edit refreshRate in the F10 configuration modal; changes apply live
Disable entirely: Set refreshRate: "" in config

Interpretation Guide

Healthy Cluster Signs

✅ Low pending job count relative to running jobs
✅ Few or no down nodes
✅ CPU/Memory utilization in green/yellow range
✅ Some idle nodes available for burst capacity

Warning Signs

⚠️ High pending to running job ratio (potential bottleneck)
⚠️ Multiple down nodes (hardware issues)
⚠️ Sustained red resource utilization (capacity limit reached)
⚠️ Zero idle nodes (no burst capacity)

Critical Issues

🚨 More pending than running jobs (severe bottleneck)
🚨 Majority of nodes down (cluster failure)
🚨 100% resource utilization sustained (oversubscribed)

Example Scenarios

Scenario 1: Healthy Cluster


Jobs:           Nodes:          Resources:
Total: 45       Total: 20       CPU: 45%  ████░░░░
Running: 30     Active: 12      Mem: 52%  █████░░░
Pending: 15     Idle: 8
                Down: 0

Analysis: Good balance, capacity available, no issues.

Scenario 2: Queue Bottleneck


Jobs:           Nodes:          Resources:
Total: 120      Total: 20       CPU: 85%  ████████
Running: 20     Active: 20      Mem: 89%  ████████
Pending: 100    Idle: 0
                Down: 0

Analysis: All nodes busy, large queue, near capacity. Consider:

Adding more nodes
Reviewing job priorities
Checking for inefficient jobs

Scenario 3: Hardware Issues


Jobs:           Nodes:          Resources:
Total: 25       Total: 20       CPU: 92%  █████████
Running: 22     Active: 14      Mem: 88%  ████████
Pending: 3      Idle: 0
                Down: 6

Analysis: 30% of nodes down, remaining nodes overloaded. Action required:

Investigate down nodes immediately
High utilization due to reduced capacity

Integration with Other Views

The Performance view provides a high-level overview. Drill down for details:

Jobs view (1): See specific job details and queue analysis
Nodes view (2): Investigate individual node status and down nodes
Partitions view (3): Check partition-specific utilization
Dashboard view (8): See health checks and detailed metrics

Tips

Monitor During Peak Hours: Check Performance view during typical peak usage times to understand baseline
Trend Analysis: Note patterns over time (daily/weekly cycles)
Capacity Planning: If consistently high utilization, plan for expansion
Quick Health Check: Performance view is perfect for quick "is everything okay?" checks

Metrics Source

All metrics are pulled from the SLURM cluster via sinfo, squeue, and cluster statistics APIs. The view shows real-time data from your actual cluster, updated every 5 seconds.

Dashboard View - Detailed cluster health
Jobs View - Job management and monitoring
Nodes View - Node-level details
Health View - Cluster health checks

Developer Note: App Diagnostics

For s9s developers, there's an App Diagnostics view that monitors the s9s CLI application itself (memory, goroutines, internal operations). This is hidden by default and can be enabled with:


# ~/.s9s/config.yaml
features:
  appDiagnostics: true

This is useful for debugging s9s performance issues, not for cluster monitoring.

Performance View

Overview

Access

Display Sections

Jobs Metrics

Nodes Metrics

Resources Metrics

Keyboard Shortcuts

Auto-Refresh

Interpretation Guide

Healthy Cluster Signs

Warning Signs

Critical Issues

Example Scenarios

Scenario 1: Healthy Cluster

Scenario 2: Queue Bottleneck

Scenario 3: Hardware Issues

Integration with Other Views

Tips

Metrics Source

Related Documentation

Developer Note: App Diagnostics