Add job runner system with API and database models #125

cmyers-mieweb · 2025-11-18T13:40:56Z

Issue: #119

Jobs / job-runner Feature

This document summarizes the new Jobs feature, how it works, how to deploy it, and how to test it. It's intended to be pasted into the PR description or included in the create-a-container docs.

Overview

This change introduces an asynchronous job system for the create-a-container service:

New Sequelize models + migrations:
- Jobs table: stores queued commands and status (pending, running, success, failure, cancelled).
- JobStatuses table: stores timestamped output logs for each job.
job-runner.js: a small service that runs with the same DB/config as the API server. It claims pending jobs, executes the configured command in a subprocess, streams stdout/stderr into JobStatuses, and updates job status on exit.
API endpoints under /api/jobs:
- POST /api/jobs — enqueue a job (admins only).
- GET /api/jobs/:id — job metadata (id, command, status, timestamps).
- GET /api/jobs/:id/status — returns log rows; supports sinceId and limit query params for incremental polling.
job-runner.service — systemd unit file (added to repo) to run the runner as a system service.

Files changed / added

models/job.js — Sequelize Job model
models/jobstatus.js — Sequelize JobStatus model
migrations/20251117120000-create-jobs.js — migration for Jobs
migrations/20251117120001-create-jobstatuses.js — migration for JobStatuses
job-runner.js — the runner service
job-runner.service — example systemd unit
routers/jobs.js — new API endpoints; POST restricted to admins
server.js — mounts /api/jobs

Security & Access Control

POST /api/jobs is restricted to admin users via the existing requireAdmin middleware. Other job endpoints require authentication (requireAuth) but are readable by authenticated non-admin users.
Important: The POST endpoint currently accepts a command string. Do NOT expose this to untrusted users. Enqueue jobs only from trusted server-side code or admin UI.

For long-term security, we can change POST /api/jobs to accept task + params instead of raw commands, and map tasks to safe server-side scripts.

Database migration

Run migrations from the create-a-container directory:

cd create-a-container
npm run db:migrate

This will create Jobs and JobStatuses tables. Ensure your DB user has privileges to ALTER CREATE tables.

Running the job-runner

The runner works with the same environment as server.js. Example manual startup (from create-a-container):

# install deps if not already
npm install
# run in foreground
npm run job-runner

To run as a systemd service on the host (recommended for production):

Copy the repo onto the target host (ensure contents are in /opt/container-creator or adjust paths).
Copy unit file to systemd and optionally create an environment file:

sudo cp create-a-container/job-runner.service /etc/systemd/system/job-runner.service
# Optional: create /etc/default/container-creator with DB and env vars
sudo systemctl daemon-reload
sudo systemctl enable --now job-runner.service
sudo systemctl status job-runner.service
sudo journalctl -u job-runner.service -f

If you provide environment variables via /etc/default/container-creator, add EnvironmentFile=/etc/default/container-creator to the unit file.

Important env variables

job-runner.js respects these environment variables:

JOB_RUNNER_POLL_MS — poll interval in ms (default 2000)
JOB_RUNNER_CWD — working directory for spawned jobs (defaults to service cwd)

The runner also uses your DB config from config/config.js (which in-turn uses .env). Ensure DB env vars (MYSQL_HOST, MYSQL_USER, MYSQL_PASSWORD, MYSQL_DATABASE) are set.

How to enqueue a job (admin)

From the UI (recommended): sign in as an admin and use the admin UI that enqueues jobs server-side.

Using curl with session cookie (example):

Login (this is the same web login endpoint used by the UI). We'll capture cookies to cookies.txt.

curl -c cookies.txt -X POST -H "Content-Type: application/x-www-form-urlencoded" \
  -d "username=admin&password=SECRET" \
  https://your-host/login

Enqueue a job (admin-only):

curl -b cookies.txt -X POST -H "Content-Type: application/json" \
  -d '{"command":"/opt/container-creator/create-container-wrapper.sh --some-args"}' \
  https://your-host/api/jobs

Response: { "id": 123, "status": "pending" }

Fetching job logs

Poll for job statuses (incremental polling):

# initial fetch
curl -b cookies.txt https://your-host/api/jobs/123/status

# later fetch only new rows (use last seen id)
curl -b cookies.txt "https://your-host/api/jobs/123/status?sinceId=45"

The API returns an array of objects: { id, output, createdAt }. Use sinceId to avoid re-downloading old logs.

Frontend streaming / status page

The existing frontend views/status.html currently talks to an in-memory jobs object. With the new persistent job system you should:

Update the status page to poll GET /api/jobs/:id/status with sinceId and append new output lines.
Optionally implement SSE (Server-Sent Events) endpoint that streams new JobStatus rows as they are created. The current implementation supports polling and incremental reads.

Testing plan

Run migrations locally and start server and job-runner in foreground.
Create an admin session in the browser and POST a job via the admin UI or curl login + POST as described above.
Verify Jobs row created with pending status.
Confirm job-runner picks up the job (watch journalctl or runner stdout) and job status becomes running.
Verify JobStatuses rows appear and contain stdout/stderr chunks.
Verify job status changes to success or failure on exit.
Verify GET /api/jobs/:id/status returns the accumulated log rows.

Rollback

To remove the feature, revert this pull request and run migrations to drop JobStatuses and Jobs tables (or run the down migration):

npm run db:migrate:undo --name 20251117120001-create-jobstatuses.js
npm run db:migrate:undo --name 20251117120000-create-jobs.js

(Adjust migration undo commands according to your migration tooling.)

Future work / improvements

Replace raw command strings with task identifiers + params to avoid arbitrary shell execution.
Implement concurrency control and worker pool (configurable MAX_WORKERS).
Add timeouts and retry policies for long-running jobs.
Switch log storage to file-backed logs with DB pointer for very large outputs.
Add SSE/WebSocket streaming endpoint for real-time frontend log updates.

Introduces a job runner service that polls for pending jobs, executes commands, and records output/status. Adds Sequelize models and migrations for Jobs and JobStatuses, a jobs API router for job management, and integrates the router into the server. Also includes a systemd service file and updates package.json scripts.

create-a-container/job-runner.js

create-a-container/systemd/job-runner.service

create-a-container/job-runner.service

create-a-container/migrations/20251117120001-create-jobstatuses.js