Improve background job reliability: Redis resilience + file-based fallbacks by brbrainerd · Pull Request #8 · IliasHad/edit-mind

brbrainerd · 2025-12-06T02:41:22Z

Summary

Improves reliability for long-running video indexing jobs and enables GPU acceleration.

Changes

Job Processing Reliability

Redis: Exponential backoff retry (100ms→30s), auto-reconnect on timeout/reset errors, 30s connection + 60s command timeout, TCP keepalive
BullMQ: 4-hour lock duration, 30-minute stalled interval, maxStalledCount=3
WebSocket: Ping/pong heartbeat, auto-reconnect, 1-hour ping timeout in Python service
File Fallbacks: Poll every 30s for completion files when WebSocket callbacks fail

GPU Acceleration

CUDA 12.4 PyTorch (torch>=2.5.0, torchvision>=0.20.0)
GPU_COUNT env variable (-1 for all GPUs, 0 to disable)
Added face_recognition + dlib dependencies

Watcher Hardening

Depth limit (top-level only) to avoid heavy recursion
Ignore patterns: Syncthing markers, temp/partial files, dotfiles
Audio extension support for audio-specific folders
ignorePermissionErrors, awaitWriteFinish for stability

Bug Fixes

Fixed analysis result path mismatch (/app/apps/background-jobs/analysis_results/)
Added start.ps1/stop.ps1 to gitignore (contain local GPU config)

Why?

Long videos caused stalls due to WebSocket timeouts and Redis ETIMEDOUT on Docker Desktop. File-based fallbacks ensure completion regardless of callback delivery.

…allbacks; add optional GPU support; secure CIFS creds via env + override compose

- Enable CUDA 12.4 PyTorch for GPU transcription/analysis - Add face_recognition + dlib dependencies - Harden file watchers: depth limit, Syncthing/temp ignores, audio support - Fix analysis result path mismatch (/app/apps/background-jobs/) - Add start.ps1/stop.ps1 to gitignore (contain GPU_COUNT)

- Add findAudioFiles and findMediaFiles functions - Update folder trigger route to detect audio folders - Audio folders (x_audio, /audio) scan for audio extensions

IliasHad

Thank you so much @brbrainerd for taking the time and making the first external PR, much appreciate it. I left couple of comments about the changes that you made

IliasHad · 2025-12-06T09:31:39Z

apps/background-jobs/src/jobs/videoIndexer.ts

Looks good to me, thank you

IliasHad · 2025-12-06T09:33:30Z

apps/background-jobs/src/routes/folders.ts

Thank you for the contribution. Let's focus only on the video, at least now, because the system is built for video creators and people with a large archive of videos

IliasHad · 2025-12-06T09:33:58Z

apps/background-jobs/src/watcher.ts

Same here for the audio support

IliasHad · 2025-12-06T09:35:49Z

docker-compose.yml

The root Docker Compose file will be for people who wanna use pre-built Docker images from GitHub container registry, if you wanna to build it or develop. You can use the Docker Compose file in the docker/ folder

IliasHad · 2025-12-06T09:37:04Z

docker/.env.system.example

That's a good feature to have, thank you for adding that. but we should add support in the backend and frontend to handle network drive

let's keep .env.example file in the project root, thank you

IliasHad · 2025-12-06T10:37:43Z

packages/shared/utils/transcribe.ts

  onProgress?: ProgressCallback
 ): Promise<void> {
  return new Promise((resolve, reject) => {
+    let resolved = false


Can you please elaborate more on why we need a file-based fallback in this case, if the web socket is sending a complete webhook message when the transcription is done? Because the transcription service could be stuck, and using this file-based fallback, the transcription will be completed (the file size hasn't changed), but in reality, the transcription job is not completed

IliasHad · 2025-12-06T10:39:26Z

packages/shared/utils/frameAnalyze.ts

  onProgress: (progress: AnalysisProgress) => void
 ): Promise<{ analysis: Analysis; category: string }> {
  return new Promise((resolve, reject) => {
+    let resolved = false


Same comment and observation for the transcription file back fallback will be for this frame analysis file fallback. We need to use the analysis_complete message from the websocket server

IliasHad · 2025-12-06T10:40:49Z

packages/shared/services/pythonService.ts

  }

+  // Ensure connection is alive, reconnect if needed
+  public async ensureConnected(): Promise<boolean> {


Where are we using this function? if weren't using it, can you please remove it?

IliasHad · 2025-12-06T10:41:17Z

packages/shared/services/pythonService.ts

          const message = JSON.parse(data.toString())
          const { type, payload, job_id } = message

+          // Handle ping/pong heartbeat from Python service


looks good to me

IliasHad · 2025-12-06T10:45:20Z

docker/Dockerfile.background-jobs

Thank you for adding support for NVIDIA CUDA support but this will make the Docker image bigger for everyone, including the users (who don't have a NVIDIA GPU). Can we have an option to include an argument to use CUDA, and in the GitHub release YAML file, add it to the strategy for building the Docker image (add a tag to the Docker image that has the NVIDIA support)

Background jobs reliability: Redis exponential backoff + file-based f…

65ce74b

…allbacks; add optional GPU support; secure CIFS creds via env + override compose

brbrainerd force-pushed the main branch from 4480ce4 to 65ce74b Compare December 6, 2025 02:45

brbrainerd added 2 commits December 5, 2025 23:18

feat: add audio file scanning support (.mka, etc)

6e7ed17

- Add findAudioFiles and findMediaFiles functions - Update folder trigger route to detect audio folders - Audio folders (x_audio, /audio) scan for audio extensions

IliasHad requested changes Dec 6, 2025

View reviewed changes

Kerbalximus mentioned this pull request Jan 22, 2026

Prebuilt image for background jobs missing for linux/amd64 platform. #21

Closed

IliasHad mentioned this pull request Jan 23, 2026

fix: increase redis timeout and bull mq lock duration for long video … #41

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve background job reliability: Redis resilience + file-based fallbacks#8

Improve background job reliability: Redis resilience + file-based fallbacks#8
brbrainerd wants to merge 3 commits intoIliasHad:mainfrom
brbrainerd:main

brbrainerd commented Dec 6, 2025 •

edited

Loading

Uh oh!

IliasHad left a comment

Uh oh!

IliasHad Dec 6, 2025

Uh oh!

IliasHad Dec 6, 2025

Uh oh!

IliasHad Dec 6, 2025

Uh oh!

IliasHad Dec 6, 2025

Uh oh!

IliasHad Dec 6, 2025

Uh oh!

IliasHad Dec 6, 2025

Uh oh!

IliasHad Dec 6, 2025

Uh oh!

IliasHad Dec 6, 2025

Uh oh!

IliasHad Dec 6, 2025

Uh oh!

IliasHad Dec 6, 2025

Uh oh!

IliasHad Dec 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

brbrainerd commented Dec 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Job Processing Reliability

GPU Acceleration

Watcher Hardening

Bug Fixes

Why?

Uh oh!

IliasHad left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

brbrainerd commented Dec 6, 2025 •

edited

Loading