UX Enhancements: ROI box, visual preview, multi-prompting, and custom mask naming#8
Open
psewdgb wants to merge 1 commit intoAyedaOk:mainfrom
Open
UX Enhancements: ROI box, visual preview, multi-prompting, and custom mask naming#8psewdgb wants to merge 1 commit intoAyedaOk:mainfrom
psewdgb wants to merge 1 commit intoAyedaOk:mainfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hi again! As discussed, here is the PR containing the workflow optimizations for both segmentation scripts.
These changes aim to give the user much more visual control and precision before saving the .pfm files to Darktable, preventing "blind" generations and saving a lot of time.
Here is the detailed list of the added features:
Added a 2-step process: Introduced cv2.selectROI at launch. The user can now draw a strict bounding box to physically isolate an object (e.g., separating skin from clothes) before placing positive/negative points.
Pressing Space validates the box (or skips it if drawn empty). The coordinates are passed to the input_boxes parameter of SAM3 to restrict its mathematical attention.
Visual Preview: Added a cv2 window displaying the generated mask in red over the image. The user can visually check the result and press Enter to approve, or Esc to cancel the process entirely (preventing bad .pfm generation in Darktable).
Multi-Prompting (Comma separated): The script now splits the text input by commas (e.g., skin, armor, sword). It queries SAM3 for each word individually and merges the results into a single, unified mask. This is extremely useful for creating a global subject mask (to apply background blur/Orton effects in Darktable).
Custom Naming via Tkinter: Upon validation, a native Tkinter pop-up asks the user for a custom tag (e.g., face, sword).
Smart Filenames: If a tag is provided, the filename is simplified and readable (e.g., image_face_193951_mask.pfm). If left blank, it defaults to the original full datetime format.
Technical note: Added a cv2.waitKey(1) before the Tkinter pop-up as a workaround for a known Windows bug to ensure OpenCV flushes memory properly and prevents UI freezing.
I developed and tested all of this on Windows. Since I used native Python libraries (tkinter) and cv2, it should be relatively cross-platform, but let me know if it requires any adjustments for Mac/Linux!
Thanks again for the amazing base code!