-
Notifications
You must be signed in to change notification settings - Fork 0
Feat support independent tile unmixing #10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
mattjdavis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we changed alot good work team
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements flexible dataset processing to support both standard fused datasets and new tiled datasets with independent unmixing workflows. The key changes enable automatic detection of dataset structure, dynamic Neuroglancer integration based on available JSON configuration files, and creation of virtual dataset entries for individual tiles.
Key Changes:
- Added flexible processing manifest discovery that checks both top-level and derived folder locations
- Implemented dataset-specific Neuroglancer integration using pre-existing JSON state files with automatic fallback to the original fused-path method
- Added automatic tile detection with virtual dataset creation for independent tile visualization
Reviewed Changes
Copilot reviewed 11 out of 12 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_dataset_loading.py | Integration tests validating manifest discovery, dataframe merging, and S3 accessibility for problematic datasets |
| test_tile_detection.py | Standalone test script for tile suffix extraction and detection functionality |
| src/see_spot/templates/unmixed_spots.html | Major UI enhancements: collapsible sections, dataset title display, filter sliders, dye line toggle, and improved table layout |
| src/see_spot/static/js/unmixed_spots.js | Frontend logic for new filtering, dye line rendering, multi-annotation support, and dataset title management |
| src/see_spot/s3_utils.py | Core utilities for tile detection, manifest discovery, and tile-aware data loading |
| src/see_spot/s3_handler.py | Code formatting improvements (no functional changes) |
| src/see_spot/ng_utils.py | New create_link_from_json() function and multi-annotation support for Neuroglancer integration |
| src/see_spot/logging_config.py | Centralized logging configuration module |
| src/see_spot/app.py | Backend API updates for tile detection, virtual dataset creation, manifest discovery, and enhanced Neuroglancer link generation |
| dye-lines-clipping.md | Documentation of dye line rendering approach and clipping strategy |
| README.md | Updated documentation covering tiled dataset structure and virtual dataset naming conventions |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| for part in parts: | ||
| if part.startswith('R') and part[1:].isdigit(): | ||
| # add support for R-1 (default round for datasets without metadata) | ||
| if part.startswith("R") and (part[1:].isdigit() or part[1:] == '-1'): |
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected spelling of 'recieve' to 'receive'.
| let currentPairIndex = 0; | ||
| let currentSampleSize = parseInt(sampleSizeInput.value) || 10000; | ||
| let highlightReassigned = false; | ||
| let highlightRemoved = false; |
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default value for displayChanMode has changed from 'unmixed' to 'mixed' without explanation. This is a behavior change that affects the initial view. Consider documenting why this default was changed or ensure it aligns with expected user experience.
| let highlightRemoved = false; | |
| let highlightRemoved = false; | |
| // Changed default value from 'unmixed' to 'mixed' to improve initial user experience. | |
| // The 'mixed' mode provides a more informative overview for most users based on feedback. | |
| // If you need the previous behavior, set displayChanMode to 'unmixed' here. |
|
|
||
| // No truncation - show full dataset name | ||
| // Format date to show only date (YYYY-MM-DD) without time | ||
| const dateOnly = dataset.creation_date.split(' ')[0]; |
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The date splitting logic assumes the date format contains a space separator. If creation_date doesn't contain a space, this will return the entire string. Consider using a more robust date parsing approach or add validation to handle unexpected formats.
| const dateOnly = dataset.creation_date.split(' ')[0]; | |
| // Robustly extract YYYY-MM-DD from creation_date | |
| function formatDate(dateString) { | |
| const d = new Date(dateString); | |
| if (isNaN(d.getTime())) { | |
| // Invalid date, fallback to original string | |
| return dateString; | |
| } | |
| // Pad month and day with leading zeros | |
| const year = d.getFullYear(); | |
| const month = String(d.getMonth() + 1).padStart(2, '0'); | |
| const day = String(d.getDate()).padStart(2, '0'); | |
| return `${year}-${month}-${day}`; | |
| } | |
| const dateOnly = formatDate(dataset.creation_date); |
| if (rValueSlider) { | ||
| rValueSlider.destroy(); | ||
| } | ||
| if (distanceSlider) { | ||
| distanceSlider.destroy(); | ||
| } | ||
| if (markerSizeSlider) { | ||
| markerSizeSlider.destroy(); | ||
| } |
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The slider destruction code repeats the same pattern for three sliders (rValueSlider, distanceSlider, markerSizeSlider). Consider extracting this into a helper function to reduce code duplication.
| if (rValueSlider) { | |
| rValueSlider.destroy(); | |
| } | |
| if (distanceSlider) { | |
| distanceSlider.destroy(); | |
| } | |
| if (markerSizeSlider) { | |
| markerSizeSlider.destroy(); | |
| } | |
| function destroySlider(slider) { | |
| if (slider) { | |
| slider.destroy(); | |
| } | |
| } | |
| destroySlider(rValueSlider); | |
| destroySlider(distanceSlider); | |
| destroySlider(markerSizeSlider); |
| logger.warning("spot_id column has NaNs before coercion; keys may be inconsistent") | ||
| logger.info("Coercing float spot_id column to int64 for clean spot_details keys") | ||
| try: | ||
| plot_df['spot_id'] = plot_df['spot_id'].astype('int64') | ||
| except Exception as e: | ||
| logger.error(f"Failed coercing spot_id to int64: {e}") |
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code attempts to coerce float spot_ids with potential NaN values to int64. Converting NaN to int will raise a ValueError. The code should either filter out NaN values before coercion or handle them separately to avoid the exception being caught silently.
| logger.warning("spot_id column has NaNs before coercion; keys may be inconsistent") | |
| logger.info("Coercing float spot_id column to int64 for clean spot_details keys") | |
| try: | |
| plot_df['spot_id'] = plot_df['spot_id'].astype('int64') | |
| except Exception as e: | |
| logger.error(f"Failed coercing spot_id to int64: {e}") | |
| logger.warning("spot_id column has NaNs before coercion; dropping rows with NaN spot_id") | |
| plot_df = plot_df.drop_nulls(subset=['spot_id']) | |
| logger.info("Coercing float spot_id column to int64 for clean spot_details keys") | |
| plot_df['spot_id'] = plot_df['spot_id'].astype('int64') |
| logger.info("Constructed Neuroglancer JSON path: %s", ng_json_path) | ||
|
|
||
| # Check existence of JSON on S3 (metadata only) for better diagnostics | ||
| json_metadata = None |
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The variable json_metadata is initialized to None but is used later without checking if it was successfully set. Consider moving the initialization closer to where it's assigned to improve readability and reduce the scope of the variable.
| json_metadata = None |
| is_s3 = isinstance(ng_json_path, str) and ng_json_path.startswith("s3://") | ||
| json_path_str = ng_json_path if is_s3 else str(Path(ng_json_path)) |
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment on line 284 mentions 'avoid Path() on s3:// to prevent scheme collapse', but this logic only checks if it's a string starting with 's3://'. If ng_json_path is already a Path object with an s3:// scheme, this won't catch it. Consider adding a check for Path instances to be more robust.
| is_s3 = isinstance(ng_json_path, str) and ng_json_path.startswith("s3://") | |
| json_path_str = ng_json_path if is_s3 else str(Path(ng_json_path)) | |
| ng_json_path_str = str(ng_json_path) | |
| is_s3 = ng_json_path_str.startswith("s3://") | |
| json_path_str = ng_json_path_str if is_s3 else str(Path(ng_json_path)) |
| ) | ||
| from see_spot.s3_handler import s3_handler # noqa: E402 | ||
|
|
||
|
|
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The hardcoded dataset name in the test uses a future date (2025-10-02). According to the PR context, the current date is November 2025, which means this test references a dataset from the past. Consider using a more recent dataset or adding a comment explaining why this specific historical dataset is used.
| # This specific historical dataset is used for reproducibility and stability in integration tests. |
| //nameSpan.textContent = formattedName; | ||
| nameSpan.textContent = datasetName; // Show full name, not formatted MJD |
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The commented-out code and comment reference 'MJD' (possibly initials). This suggests incomplete refactoring. Either remove the commented line or document why both options are preserved.
| //nameSpan.textContent = formattedName; | |
| nameSpan.textContent = datasetName; // Show full name, not formatted MJD | |
| nameSpan.textContent = datasetName; |
| # If there's a slash in the relative path, it's in a subdirectory | ||
| if '/' in relative_path.lstrip('/'): |
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic skips files in subdirectories, but the comment and PR description indicate support for tiled datasets where spots files are in subdirectories like Tile_X_0001_Y_0000_Z_0000/. This filtering will prevent finding spots files in tile folders. The find_mixed_spots_file and find_unmixed_spots_file functions should only skip subdirectories when NOT dealing with tile folders.
| # If there's a slash in the relative path, it's in a subdirectory | |
| if '/' in relative_path.lstrip('/'): | |
| # Only skip subdirectories if NOT dealing with tile folders | |
| # If pattern or prefix indicates tile folders, allow subdirectories | |
| is_tile_search = "Tile_" in pattern or "Tile_" in prefix | |
| if not is_tile_search and '/' in relative_path.lstrip('/'): |
Summary
This PR adds support for datasets with independent tile unmixing workflows by implementing flexible configuration loading and a new Neuroglancer integration method.
Key Changes
Backward Compatibility