Skip to content

"jobhist" Database Integration #31

Merged
vanderwb merged 10 commits intomainfrom
jobhist_db
Mar 19, 2026
Merged

"jobhist" Database Integration #31
vanderwb merged 10 commits intomainfrom
jobhist_db

Conversation

@benkirk
Copy link
Collaborator

@benkirk benkirk commented Mar 2, 2026

This PR introduces the ability for qhist to optionally query job records from the hpc-usage-queries jobhist database module instead of relying exclusively on flat-file PBS log scanning.

To support this, an optional jobhist-db dependency has been added to pyproject.toml that installs the required database integrations. Inside the main application, the core output and formatting logic has been refactored into a reusable emit_formatted_jobs() helper function. The tool now dynamically checks if the database module is installed and available for the target machine; if so, it fetches the records directly using the new database API. If the database is unavailable or the import fails, qhist will gracefully fall back to the original log file scanning behavior, emitting a warning.

Additionally, the PR adds a .env.example file to document the available database configuration settings (supporting both SQLite and PostgreSQL backends) and updated .gitignore to ensure local .env credentials are kept out of source control. The target machine can now be explicitly overridden via the QHIST_MACHINE environment variable.

Finally, this update includes minor variable scoping fixes (nonlocal bindings for statistics tracking) to support the refactored formatting loop.

Open issues or questions:

  • .env is not necessarily required; but some configuration/passthrough is needed for Postgres server connection setup
  • the machine could optionally be exposed through CLI arguments. This is not done currently.

benkirk added 6 commits March 2, 2026 12:34
…atabase.

Refactors the main loop into an emit_jobs() function to handle the
output dispatching, and conditionally check for the presence of the DB
using the machine attribute from the configuration.

Key aspects of the change:
   1. Scope/Binding Issues Avoided:
      Because averages and num_jobs were previously only initialized
      dynamically under the if args.average: block, they would cause an
      UnboundLocalError or NameError in emit_jobs() if not properly
      scoped. I've ensured these are initialized before emit_jobs is
      defined so the nonlocal directive bindings are stable regardless of
      the CLI arguments.

   2. Simplified Output Dispatching:
      The repetitive job output logic is now cleanly housed inside
      emit_jobs(jobs_iter).

   3. Database Integration with Scan Fallback:
      If config.machine is set and db_available(machine) evaluates to
      True, it fetches records from the DB using
      db_get_records. Otherwise, it logs a warning (if machine is
      present) and predictably falls back to scanning the PBS logs.
@vanderwb vanderwb mentioned this pull request Mar 2, 2026
@vanderwb vanderwb marked this pull request as ready for review March 19, 2026 19:45
@vanderwb
Copy link
Collaborator

Looks good to me. Having machine selection doable via an option would be a nice extension, but it's moot until the DB server is robust and in production.

@vanderwb vanderwb merged commit 9400a0d into main Mar 19, 2026
3 checks passed
@vanderwb vanderwb deleted the jobhist_db branch March 19, 2026 20:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants