Skip to content

Improve OCR quality: dark themes, resolution, extraction logic #3

@Korkyzer

Description

@Korkyzer

Improve OCR text extraction quality (dark UIs, resolution, extraction logic)

Problem

The current OCR pipeline misses most text on dark-themed applications (WhatsApp, Slack, Discord, etc.). On a WhatsApp conversation with dozens of visible messages, agent-watch only captured ~80 characters (menu bar text: "File Edit Chat Call View Window Help").

Root causes identified

  1. NativeTextExtractor short-circuits OCR: The accessibility extractor runs first. If it returns ≥ minimumAccessibilityChars (even just menu bar items), OCR is skipped entirely. For WhatsApp, accessibility returns ~100 chars of sidebar/menu text, satisfying the minimum — so the Vision framework OCR never runs on the actual message content.

  2. Frame buffer resolution too low: FrameBufferStore downscales captures to maxDimension = 1280, which halves Retina resolution (2560 → 1280). Text becomes too small for reliable OCR, especially in dense UIs.

  3. Apple Vision framework struggles with dark themes: VNRecognizeTextRequest performs poorly on light-on-dark text. The Vision framework was designed primarily for document scanning (dark text on light backgrounds).

Changes

1. NativeTextExtractor.swift — Always run both extractors, keep the best

Before: Accessibility runs first; if it returns enough chars, OCR is skipped.
After: Both accessibility AND OCR always run. The result with more text wins.

// Before
if let accessibilityText = accessibilityExtractor.extractText(),
   accessibilityText.count >= minimumAccessibilityChars {
    return ExtractedText(text: accessibilityText, source: .accessibility, metadata: metadata)
}
// OCR only runs as fallback

// After
let accessibilityText = accessibilityExtractor.extractText()
var ocrText: String? = nil
if ocrEnabled {
    ocrText = try ocrExtractor.extractText()
}
// Return whichever extracted more text
if ocrLen > accLen { return ocr } else { return accessibility }

2. FrameBufferStore.swift — Increase resolution to full Retina

// Before
maxDimension: Int = 1280

// After
maxDimension: Int = 2560

Disk impact: frames go from ~250KB to ~400-800KB. With the existing retention/pruning policy this remains well under control.

3. OCRTextExtractor.swift — Color inversion for dark themes

Runs OCR twice: once on the original image, once on a color-inverted version (using CoreImage CIColorInvert). Keeps whichever result contains more text. Also lowered minimumTextHeight from 0.005 to 0.002 to catch smaller text.

Results

Metric Before After
WhatsApp text captured ~80 chars (menu bar only) 1567 chars (all messages, contacts, timestamps, links)
Frame resolution 1280×831 2560×1662
text_source for WhatsApp accessibility (short-circuited) ocr (full Vision + inversion)

Environment

  • macOS 15 (Tahoe)
  • MacBook Pro M-series (Retina display)
  • WhatsApp desktop, Slack, Discord (dark theme)

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions