Skip to content

Improve background job reliability: Redis resilience + file-based fallbacks#8

Open
brbrainerd wants to merge 3 commits intoIliasHad:mainfrom
brbrainerd:main
Open

Improve background job reliability: Redis resilience + file-based fallbacks#8
brbrainerd wants to merge 3 commits intoIliasHad:mainfrom
brbrainerd:main

Conversation

@brbrainerd
Copy link

@brbrainerd brbrainerd commented Dec 6, 2025

Summary

Improves reliability for long-running video indexing jobs and enables GPU acceleration.

Changes

Job Processing Reliability

  • Redis: Exponential backoff retry (100ms→30s), auto-reconnect on timeout/reset errors, 30s connection + 60s command timeout, TCP keepalive
  • BullMQ: 4-hour lock duration, 30-minute stalled interval, maxStalledCount=3
  • WebSocket: Ping/pong heartbeat, auto-reconnect, 1-hour ping timeout in Python service
  • File Fallbacks: Poll every 30s for completion files when WebSocket callbacks fail

GPU Acceleration

  • CUDA 12.4 PyTorch (torch>=2.5.0, torchvision>=0.20.0)
  • GPU_COUNT env variable (-1 for all GPUs, 0 to disable)
  • Added face_recognition + dlib dependencies

Watcher Hardening

  • Depth limit (top-level only) to avoid heavy recursion
  • Ignore patterns: Syncthing markers, temp/partial files, dotfiles
  • Audio extension support for audio-specific folders
  • ignorePermissionErrors, awaitWriteFinish for stability

Bug Fixes

  • Fixed analysis result path mismatch (/app/apps/background-jobs/analysis_results/)
  • Added start.ps1/stop.ps1 to gitignore (contain local GPU config)

Why?

Long videos caused stalls due to WebSocket timeouts and Redis ETIMEDOUT on Docker Desktop. File-based fallbacks ensure completion regardless of callback delivery.

…allbacks; add optional GPU support; secure CIFS creds via env + override compose
- Enable CUDA 12.4 PyTorch for GPU transcription/analysis
- Add face_recognition + dlib dependencies
- Harden file watchers: depth limit, Syncthing/temp ignores, audio support
- Fix analysis result path mismatch (/app/apps/background-jobs/)
- Add start.ps1/stop.ps1 to gitignore (contain GPU_COUNT)
- Add findAudioFiles and findMediaFiles functions
- Update folder trigger route to detect audio folders
- Audio folders (x_audio, /audio) scan for audio extensions
Copy link
Owner

@IliasHad IliasHad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much @brbrainerd for taking the time and making the first external PR, much appreciate it. I left couple of comments about the changes that you made

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thank you

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution. Let's focus only on the video, at least now, because the system is built for video creators and people with a large archive of videos

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here for the audio support

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The root Docker Compose file will be for people who wanna use pre-built Docker images from GitHub container registry, if you wanna to build it or develop. You can use the Docker Compose file in the docker/ folder

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good feature to have, thank you for adding that. but we should add support in the backend and frontend to handle network drive

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's keep .env.example file in the project root, thank you

onProgress?: ProgressCallback
): Promise<void> {
return new Promise((resolve, reject) => {
let resolved = false
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please elaborate more on why we need a file-based fallback in this case, if the web socket is sending a complete webhook message when the transcription is done? Because the transcription service could be stuck, and using this file-based fallback, the transcription will be completed (the file size hasn't changed), but in reality, the transcription job is not completed

onProgress: (progress: AnalysisProgress) => void
): Promise<{ analysis: Analysis; category: string }> {
return new Promise((resolve, reject) => {
let resolved = false
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment and observation for the transcription file back fallback will be for this frame analysis file fallback. We need to use the analysis_complete message from the websocket server

}

// Ensure connection is alive, reconnect if needed
public async ensureConnected(): Promise<boolean> {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are we using this function? if weren't using it, can you please remove it?

const message = JSON.parse(data.toString())
const { type, payload, job_id } = message

// Handle ping/pong heartbeat from Python service
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding support for NVIDIA CUDA support but this will make the Docker image bigger for everyone, including the users (who don't have a NVIDIA GPU). Can we have an option to include an argument to use CUDA, and in the GitHub release YAML file, add it to the strategy for building the Docker image (add a tag to the Docker image that has the NVIDIA support)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants