Skip to content

UX Enhancements: ROI box, visual preview, multi-prompting, and custom mask naming#8

Open
psewdgb wants to merge 1 commit intoAyedaOk:mainfrom
psewdgb:main
Open

UX Enhancements: ROI box, visual preview, multi-prompting, and custom mask naming#8
psewdgb wants to merge 1 commit intoAyedaOk:mainfrom
psewdgb:main

Conversation

@psewdgb
Copy link

@psewdgb psewdgb commented Mar 11, 2026

Hi again! As discussed, here is the PR containing the workflow optimizations for both segmentation scripts.

These changes aim to give the user much more visual control and precision before saving the .pfm files to Darktable, preventing "blind" generations and saving a lot of time.

Here is the detailed list of the added features:

  1. point_segmentation.py (Box + Points workflow)

Added a 2-step process: Introduced cv2.selectROI at launch. The user can now draw a strict bounding box to physically isolate an object (e.g., separating skin from clothes) before placing positive/negative points.

Pressing Space validates the box (or skips it if drawn empty). The coordinates are passed to the input_boxes parameter of SAM3 to restrict its mathematical attention.

  1. text_segmentation.py (Preview & Multi-targets)

Visual Preview: Added a cv2 window displaying the generated mask in red over the image. The user can visually check the result and press Enter to approve, or Esc to cancel the process entirely (preventing bad .pfm generation in Darktable).

Multi-Prompting (Comma separated): The script now splits the text input by commas (e.g., skin, armor, sword). It queries SAM3 for each word individually and merges the results into a single, unified mask. This is extremely useful for creating a global subject mask (to apply background blur/Orton effects in Darktable).

  1. Global QoL / File Management (Both scripts)

Custom Naming via Tkinter: Upon validation, a native Tkinter pop-up asks the user for a custom tag (e.g., face, sword).

Smart Filenames: If a tag is provided, the filename is simplified and readable (e.g., image_face_193951_mask.pfm). If left blank, it defaults to the original full datetime format.

Technical note: Added a cv2.waitKey(1) before the Tkinter pop-up as a workaround for a known Windows bug to ensure OpenCV flushes memory properly and prevents UI freezing.

I developed and tested all of this on Windows. Since I used native Python libraries (tkinter) and cv2, it should be relatively cross-platform, but let me know if it requires any adjustments for Mac/Linux!

Thanks again for the amazing base code!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant