-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Improve OCR text extraction quality (dark UIs, resolution, extraction logic)
Problem
The current OCR pipeline misses most text on dark-themed applications (WhatsApp, Slack, Discord, etc.). On a WhatsApp conversation with dozens of visible messages, agent-watch only captured ~80 characters (menu bar text: "File Edit Chat Call View Window Help").
Root causes identified
-
NativeTextExtractorshort-circuits OCR: The accessibility extractor runs first. If it returns ≥minimumAccessibilityChars(even just menu bar items), OCR is skipped entirely. For WhatsApp, accessibility returns ~100 chars of sidebar/menu text, satisfying the minimum — so the Vision framework OCR never runs on the actual message content. -
Frame buffer resolution too low:
FrameBufferStoredownscales captures tomaxDimension = 1280, which halves Retina resolution (2560 → 1280). Text becomes too small for reliable OCR, especially in dense UIs. -
Apple Vision framework struggles with dark themes:
VNRecognizeTextRequestperforms poorly on light-on-dark text. The Vision framework was designed primarily for document scanning (dark text on light backgrounds).
Changes
1. NativeTextExtractor.swift — Always run both extractors, keep the best
Before: Accessibility runs first; if it returns enough chars, OCR is skipped.
After: Both accessibility AND OCR always run. The result with more text wins.
// Before
if let accessibilityText = accessibilityExtractor.extractText(),
accessibilityText.count >= minimumAccessibilityChars {
return ExtractedText(text: accessibilityText, source: .accessibility, metadata: metadata)
}
// OCR only runs as fallback
// After
let accessibilityText = accessibilityExtractor.extractText()
var ocrText: String? = nil
if ocrEnabled {
ocrText = try ocrExtractor.extractText()
}
// Return whichever extracted more text
if ocrLen > accLen { return ocr } else { return accessibility }2. FrameBufferStore.swift — Increase resolution to full Retina
// Before
maxDimension: Int = 1280
// After
maxDimension: Int = 2560Disk impact: frames go from ~250KB to ~400-800KB. With the existing retention/pruning policy this remains well under control.
3. OCRTextExtractor.swift — Color inversion for dark themes
Runs OCR twice: once on the original image, once on a color-inverted version (using CoreImage CIColorInvert). Keeps whichever result contains more text. Also lowered minimumTextHeight from 0.005 to 0.002 to catch smaller text.
Results
| Metric | Before | After |
|---|---|---|
| WhatsApp text captured | ~80 chars (menu bar only) | 1567 chars (all messages, contacts, timestamps, links) |
| Frame resolution | 1280×831 | 2560×1662 |
| text_source for WhatsApp | accessibility (short-circuited) | ocr (full Vision + inversion) |
Environment
- macOS 15 (Tahoe)
- MacBook Pro M-series (Retina display)
- WhatsApp desktop, Slack, Discord (dark theme)
Related
- Issue fix: CGDisplayCreateImage returns desktop wallpaper instead of screen content #1: CGDisplayCreateImage returns wallpaper on macOS Sequoia+ (the prerequisite fix that made screen capture work at all)