API Documentation
Complete REST API reference for S9S, enabling programmatic access to cluster management, job operations, and system information.
🌐 API Overview
The S9S REST API provides:
- RESTful endpoints for all S9S functionality
- JSON request/response format
- Authentication via tokens, OAuth2, or certificates
- Rate limiting and caching
- WebSocket support for real-time updates
- OpenAPI 3.0 specification
Base URL
https://your-s9s-instance.com/api/v1
API Versions
- v1: Current stable version
- v2: Beta version (if available)
🔐 Authentication
Token Authentication
curl -H "Authorization: Bearer YOUR_TOKEN" \ https://api.example.com/v1/jobs
API Key Authentication
curl -H "X-API-Key: YOUR_API_KEY" \ https://api.example.com/v1/jobs
OAuth 2.0
# Get access token curl -X POST https://api.example.com/oauth/token \ -H "Content-Type: application/json" \ -d '{ "client_id": "your_client_id", "client_secret": "your_client_secret", "grant_type": "client_credentials" }' # Use access token curl -H "Authorization: Bearer ACCESS_TOKEN" \ https://api.example.com/v1/jobs
📋 Jobs API
List Jobs
GET /v1/jobs
Parameters:
- (string): Filter by job state
state
- (string): Filter by username
user
- (string): Filter by partition
partition
- (integer): Maximum results (default: 100)
limit
- (integer): Pagination offset
offset
- (string): Sort field (default: submit_time)
sort
- (string): Sort order - asc/desc (default: desc)
order
Example Request:
curl "https://api.example.com/v1/jobs?state=RUNNING&user=alice&limit=50"
Example Response:
{ "jobs": [ { "job_id": "12345", "job_name": "simulation_run", "user": "alice", "account": "research", "partition": "gpu", "state": "RUNNING", "submit_time": "2023-12-15T10:30:00Z", "start_time": "2023-12-15T10:32:15Z", "end_time": null, "elapsed_time": "2:15:30", "time_limit": "4:00:00", "nodes": ["node001", "node002"], "node_count": 2, "cpu_count": 32, "memory": "128GB", "gpus": 4, "priority": 1000, "qos": "normal", "working_directory": "/home/alice/simulations", "standard_output": "/home/alice/logs/job_12345.out", "standard_error": "/home/alice/logs/job_12345.err", "exit_code": null, "reason": null } ], "total": 156, "limit": 50, "offset": 0, "has_more": true }
Get Job Details
GET /v1/jobs/{job_id}
Example Request:
curl "https://api.example.com/v1/jobs/12345"
Example Response:
{ "job_id": "12345", "job_name": "simulation_run", "user": "alice", "account": "research", "partition": "gpu", "state": "RUNNING", "submit_time": "2023-12-15T10:30:00Z", "start_time": "2023-12-15T10:32:15Z", "end_time": null, "elapsed_time": "2:15:30", "time_limit": "4:00:00", "nodes": ["node001", "node002"], "node_count": 2, "cpu_count": 32, "memory": "128GB", "gpus": 4, "priority": 1000, "qos": "normal", "working_directory": "/home/alice/simulations", "command": "/home/alice/scripts/run_simulation.sh", "environment": { "PATH": "/usr/local/bin:/usr/bin:/bin", "CUDA_VISIBLE_DEVICES": "0,1,2,3" }, "resources": { "cpu_efficiency": 0.85, "memory_efficiency": 0.67, "gpu_utilization": 0.92, "max_memory_used": "89GB", "max_cpu_time": "127.5h" }, "dependencies": [], "array_job_id": null, "array_task_id": null }
Submit Job
POST /v1/jobs
Request Body:
{ "job_name": "my_simulation", "script": "/home/alice/scripts/simulation.sh", "partition": "gpu", "nodes": 2, "cpus_per_node": 16, "memory": "64GB", "gpus": 2, "time_limit": "4:00:00", "account": "research", "qos": "normal", "working_directory": "/home/alice/simulations", "standard_output": "/home/alice/logs/%j.out", "standard_error": "/home/alice/logs/%j.err", "environment": { "CUDA_VISIBLE_DEVICES": "0,1" }, "dependencies": [ { "type": "afterok", "job_id": "12344" } ] }
Example Request:
curl -X POST "https://api.example.com/v1/jobs" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer TOKEN" \ -d '{ "job_name": "test_job", "script": "/home/alice/test.sh", "nodes": 1, "cpus_per_node": 4, "time_limit": "1:00:00" }'
Example Response:
{ "job_id": "12346", "state": "PENDING", "message": "Job submitted successfully" }
Cancel Job
DELETE /v1/jobs/{job_id}
Parameters:
- (string): Signal to send (default: SIGTERM)
signal
- (boolean): Force cancellation
force
Example Request:
curl -X DELETE "https://api.example.com/v1/jobs/12345?force=true"
Example Response:
{ "job_id": "12345", "state": "CANCELLED", "message": "Job cancelled successfully" }
Hold/Release Job
PUT /v1/jobs/{job_id}/hold PUT /v1/jobs/{job_id}/release
Request Body (Hold):
{ "reason": "Debugging required" }
Example Request:
curl -X PUT "https://api.example.com/v1/jobs/12345/hold" \ -H "Content-Type: application/json" \ -d '{"reason": "Need to debug"}'
Update Job Priority
PUT /v1/jobs/{job_id}/priority
Request Body:
{ "priority": 1500 }
Get Job Output
GET /v1/jobs/{job_id}/output GET /v1/jobs/{job_id}/error
Parameters:
- (integer): Number of lines to retrieve
lines
- (boolean): Follow output (WebSocket)
follow
🖥️ Nodes API
List Nodes
GET /v1/nodes
Parameters:
- (string): Filter by node state
state
- (string): Filter by partition
partition
- (string): Filter by features
features
- (boolean): Filter available nodes only
available
Example Response:
{ "nodes": [ { "node_name": "node001", "state": "MIXED", "partitions": ["gpu", "normal"], "cpu_total": 32, "cpu_allocated": 16, "cpu_load": 14.2, "memory_total": "128GB", "memory_allocated": "64GB", "memory_free": "58GB", "gpus_total": 4, "gpus_allocated": 2, "features": ["gpu", "cuda", "infiniband"], "architecture": "x86_64", "os": "Linux 5.4.0-74-generic", "boot_time": "2023-12-01T08:30:00Z", "last_busy": "2023-12-15T12:45:30Z", "jobs": ["12345", "12347"], "reason": null } ], "total": 256 }
Get Node Details
GET /v1/nodes/{node_name}
Drain/Resume Node
PUT /v1/nodes/{node_name}/drain PUT /v1/nodes/{node_name}/resume
Request Body (Drain):
{ "reason": "Hardware maintenance", "timeout": "2h" }
Update Node State
PUT /v1/nodes/{node_name}/state
Request Body:
{ "state": "DOWN", "reason": "Hardware failure" }
👥 Users API
List Users
GET /v1/users
Example Response:
{ "users": [ { "username": "alice", "default_account": "research", "admin_level": "none", "jobs_running": 15, "jobs_pending": 3, "cpu_time_used": "1247h", "associations": [ { "account": "research", "partition": "gpu", "qos": ["normal", "high"] } ] } ] }
Get User Details
GET /v1/users/{username}
📊 Partitions API
List Partitions
GET /v1/partitions
Example Response:
{ "partitions": [ { "name": "gpu", "state": "UP", "nodes_total": 32, "nodes_idle": 8, "nodes_allocated": 20, "nodes_mixed": 4, "cpus_total": 1024, "cpus_allocated": 756, "default_time": "2:00:00", "max_time": "7-00:00:00", "max_nodes": 16, "priority": 1000, "qos": "normal", "features": ["gpu", "cuda"] } ] }
📈 Metrics API
Cluster Statistics
GET /v1/metrics/cluster
Example Response:
{ "timestamp": "2023-12-15T14:30:00Z", "cluster_utilization": { "cpu_percent": 78.5, "memory_percent": 65.2, "gpu_percent": 84.7 }, "job_counts": { "RUNNING": 1247, "PENDING": 156, "COMPLETED": 45672, "FAILED": 234, "CANCELLED": 89 }, "node_counts": { "IDLE": 45, "MIXED": 89, "ALLOCATED": 112, "DOWN": 3, "DRAIN": 2 }, "queue_stats": { "avg_wait_time": "12m 34s", "max_wait_time": "2h 15m", "throughput": 145.7 } }
Performance Metrics
GET /v1/metrics/performance
Parameters:
- (string): Start time (ISO 8601)
start_time
- (string): End time (ISO 8601)
end_time
- (string): Aggregation interval (1m, 5m, 1h, 1d)
interval
Example Response:
{ "metrics": [ { "timestamp": "2023-12-15T14:00:00Z", "cpu_utilization": 76.8, "memory_utilization": 63.5, "gpu_utilization": 82.1, "jobs_submitted": 45, "jobs_completed": 38, "avg_queue_time": 720 } ], "interval": "1h", "count": 24 }
Job Efficiency
GET /v1/metrics/efficiency
Parameters:
- (string): Filter by user
user
- (string): Filter by partition
partition
- (string): Minimum runtime threshold
min_runtime
🔍 Search API
Global Search
GET /v1/search
Parameters:
- (string): Search query
q
- (string): Resource type (jobs, nodes, users)
type
- (integer): Maximum results
limit
Example Request:
curl "https://api.example.com/v1/search?q=alice%20gpu&limit=20"
Example Response:
{ "results": [ { "type": "job", "id": "12345", "title": "alice's GPU simulation", "description": "Running on gpu partition", "url": "/v1/jobs/12345", "score": 0.95 }, { "type": "node", "id": "gpu001", "title": "GPU Node 001", "description": "4x NVIDIA A100 GPUs", "url": "/v1/nodes/gpu001", "score": 0.82 } ], "total": 15, "query_time": 0.045 }
📤 Export API
Export Data
POST /v1/export
Request Body:
{ "type": "jobs", "format": "csv", "filter": { "state": "COMPLETED", "user": "alice", "start_time": "2023-12-01T00:00:00Z", "end_time": "2023-12-31T23:59:59Z" }, "fields": ["job_id", "job_name", "user", "state", "runtime"], "options": { "include_headers": true, "delimiter": ",", "filename": "alice_december_jobs.csv" } }
Example Response:
{ "export_id": "exp_abc123", "status": "processing", "download_url": null, "created_at": "2023-12-15T14:30:00Z", "expires_at": "2023-12-16T14:30:00Z" }
Get Export Status
GET /v1/exports/{export_id}
Example Response:
{ "export_id": "exp_abc123", "status": "completed", "download_url": "https://api.example.com/v1/exports/exp_abc123/download", "file_size": 2048576, "record_count": 1247, "created_at": "2023-12-15T14:30:00Z", "completed_at": "2023-12-15T14:32:15Z", "expires_at": "2023-12-16T14:30:00Z" }
Download Export
GET /v1/exports/{export_id}/download
📊 Reports API
Generate Report
POST /v1/reports
Request Body:
{ "type": "utilization", "format": "pdf", "parameters": { "period": "month", "start_date": "2023-12-01", "end_date": "2023-12-31", "partitions": ["gpu", "cpu"], "include_charts": true }, "delivery": { "method": "email", "email": "[email protected]" } }
List Reports
GET /v1/reports
🔄 WebSocket API
Real-time Updates
const ws = new WebSocket('wss://api.example.com/v1/ws'); // Subscribe to job updates ws.send(JSON.stringify({ "action": "subscribe", "channel": "jobs", "filter": { "user": "alice", "state": "RUNNING" } })); // Receive updates ws.onmessage = function(event) { const update = JSON.parse(event.data); console.log('Job update:', update); };
Update Message Format:
{ "channel": "jobs", "action": "update", "resource_id": "12345", "timestamp": "2023-12-15T14:35:00Z", "data": { "job_id": "12345", "state": "COMPLETED", "end_time": "2023-12-15T14:34:45Z", "exit_code": 0 } }
Available Channels
- - Job state changes
jobs
- - Node state changes
nodes
- - Performance metrics
metrics
- - System alerts
alerts
- - Queue statistics
queue
📋 Batch Operations API
Bulk Job Operations
POST /v1/jobs/bulk
Request Body:
{ "action": "cancel", "job_ids": ["12345", "12346", "12347"], "options": { "force": true, "reason": "Emergency maintenance" } }
Example Response:
{ "batch_id": "batch_xyz789", "results": [ { "job_id": "12345", "status": "success", "message": "Job cancelled" }, { "job_id": "12346", "status": "error", "message": "Job already completed" } ], "summary": { "total": 3, "successful": 2, "failed": 1 } }
❌ Error Handling
Error Response Format
{ "error": { "code": "RESOURCE_NOT_FOUND", "message": "Job 99999 not found", "details": { "resource_type": "job", "resource_id": "99999" }, "request_id": "req_abc123", "timestamp": "2023-12-15T14:30:00Z" } }
HTTP Status Codes
- - Success
200 OK
- - Resource created
201 Created
- - Invalid request
400 Bad Request
- - Authentication required
401 Unauthorized
- - Insufficient permissions
403 Forbidden
- - Resource not found
404 Not Found
- - Resource conflict
409 Conflict
- - Rate limit exceeded
429 Too Many Requests
- - Server error
500 Internal Server Error
- - Service unavailable
503 Service Unavailable
Common Error Codes
- - Malformed request
INVALID_REQUEST
- - Invalid credentials
AUTHENTICATION_FAILED
- - Insufficient permissions
PERMISSION_DENIED
- - Resource doesn't exist
RESOURCE_NOT_FOUND
- - Resource state conflict
RESOURCE_CONFLICT
- - Too many requests
RATE_LIMIT_EXCEEDED
- - Request validation failed
VALIDATION_ERROR
- - SLURM backend error
SLURM_ERROR
🚀 Rate Limiting
Rate Limit Headers
X-RateLimit-Limit: 1000 X-RateLimit-Remaining: 897 X-RateLimit-Reset: 1640995200 X-RateLimit-Window: 3600
Rate Limits
- Free Tier: 1,000 requests/hour
- Pro Tier: 10,000 requests/hour
- Enterprise: Custom limits
📖 OpenAPI Specification
Download Specification
GET /v1/openapi.json GET /v1/openapi.yaml
Interactive Documentation
- Swagger UI:
https://api.example.com/docs
- ReDoc:
https://api.example.com/redoc
🛠️ SDK Examples
Python SDK
from s9s_client import S9SClient # Initialize client client = S9SClient( base_url="https://api.example.com", token="your_token_here" ) # List jobs jobs = client.jobs.list(state="RUNNING", user="alice") # Submit job job = client.jobs.submit( name="test_job", script="/home/alice/test.sh", nodes=2, cpus_per_node=16 ) # Get job details job_details = client.jobs.get(job.job_id) # Cancel job client.jobs.cancel(job.job_id)
JavaScript SDK
import { S9SClient } from '@s9s/client'; const client = new S9SClient({ baseUrl: 'https://api.example.com', token: 'your_token_here' }); // List jobs const jobs = await client.jobs.list({ state: 'RUNNING', user: 'alice' }); // Submit job const job = await client.jobs.submit({ name: 'test_job', script: '/home/alice/test.sh', nodes: 2, cpusPerNode: 16 });
Go SDK
package main import ( "github.com/s9s/go-client" ) func main() { client := s9s.NewClient(&s9s.Config{ BaseURL: "https://api.example.com", Token: "your_token_here", }) // List jobs jobs, err := client.Jobs.List(&s9s.JobListOptions{ State: "RUNNING", User: "alice", }) // Submit job job, err := client.Jobs.Submit(&s9s.JobSubmitRequest{ Name: "test_job", Script: "/home/alice/test.sh", Nodes: 2, CPUsPerNode: 16, }) }
🔧 Testing
Test Environment
# Base URL for testing export S9S_API_URL="https://api-test.example.com" export S9S_API_TOKEN="test_token_123"
cURL Examples
# Test authentication curl -H "Authorization: Bearer $S9S_API_TOKEN" \ $S9S_API_URL/v1/status # List jobs curl -H "Authorization: Bearer $S9S_API_TOKEN" \ "$S9S_API_URL/v1/jobs?limit=10" # Submit test job curl -X POST \ -H "Authorization: Bearer $S9S_API_TOKEN" \ -H "Content-Type: application/json" \ -d '{"job_name":"test","script":"/bin/sleep 60","nodes":1}' \ "$S9S_API_URL/v1/jobs"
🚀 Next Steps
- Download SDKs from GitHub
- Try the interactive API explorer
- Read integration guides for popular tools
- Join the developer community