Feat support independent tile unmixing #10

cbrrrry · 2025-11-06T22:40:34Z

Summary
This PR adds support for datasets with independent tile unmixing workflows by implementing flexible configuration loading and a new Neuroglancer integration method.

Key Changes

Flexible Processing Manifest Discovery

Added find_processing_manifest() helper function that checks for processing_manifest.json in both top-level ({dataset}/) and derived folder ({dataset}/derived/) locations
Updated all manifest loading logic in /api/real_spots_data, /api/datasets/download, and caching to use the new discovery method
Provides better error messages showing all paths checked when manifest not found

Dataset-Specific Neuroglancer Integration

Added create_link_from_json() function in ng_utils.py that generates Neuroglancer links from pre-existing JSON state files
Supports both local file paths and S3 URIs for JSON state files
Intelligently updates position and annotation layers while preserving all other viewer settings

Automatic Method Selection for Neuroglancer Links

Modified /api/create-neuroglancer-link endpoint to automatically detect dataset type
When "merged" appears in pkl filenames (e.g., mixed_spots_R2_merged.pkl), uses the new JSON-based method with phase_correlation_stitching_neuroglancer.json
Falls back to traditional fused-path method for standard datasets
Caches unmixed spots filename for efficient detection

Backward Compatibility

All changes are fully backward compatible with existing datasets
Standard HCR datasets continue to work with the original workflow
New features activate automatically based on dataset structure

mattjdavis

we changed alot good work team

Copilot

Pull Request Overview

This PR implements flexible dataset processing to support both standard fused datasets and new tiled datasets with independent unmixing workflows. The key changes enable automatic detection of dataset structure, dynamic Neuroglancer integration based on available JSON configuration files, and creation of virtual dataset entries for individual tiles.

Key Changes:

Added flexible processing manifest discovery that checks both top-level and derived folder locations
Implemented dataset-specific Neuroglancer integration using pre-existing JSON state files with automatic fallback to the original fused-path method
Added automatic tile detection with virtual dataset creation for independent tile visualization

Reviewed Changes

Copilot reviewed 11 out of 12 changed files in this pull request and generated 12 comments.

Show a summary per file

File	Description
tests/test_dataset_loading.py	Integration tests validating manifest discovery, dataframe merging, and S3 accessibility for problematic datasets
test_tile_detection.py	Standalone test script for tile suffix extraction and detection functionality
src/see_spot/templates/unmixed_spots.html	Major UI enhancements: collapsible sections, dataset title display, filter sliders, dye line toggle, and improved table layout
src/see_spot/static/js/unmixed_spots.js	Frontend logic for new filtering, dye line rendering, multi-annotation support, and dataset title management
src/see_spot/s3_utils.py	Core utilities for tile detection, manifest discovery, and tile-aware data loading
src/see_spot/s3_handler.py	Code formatting improvements (no functional changes)
src/see_spot/ng_utils.py	New `create_link_from_json()` function and multi-annotation support for Neuroglancer integration
src/see_spot/logging_config.py	Centralized logging configuration module
src/see_spot/app.py	Backend API updates for tile detection, virtual dataset creation, manifest discovery, and enhanced Neuroglancer link generation
dye-lines-clipping.md	Documentation of dye line rendering approach and clipping strategy
README.md	Updated documentation covering tiled dataset structure and virtual dataset naming conventions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-14T21:29:57Z

src/see_spot/s3_utils.py

    for part in parts:
-        if part.startswith('R') and part[1:].isdigit():
+        # add support for R-1 (default round for datasets without metadata)
+        if part.startswith("R") and (part[1:].isdigit() or part[1:] == '-1'):


Corrected spelling of 'recieve' to 'receive'.

Copilot · 2025-11-14T21:29:57Z

src/see_spot/static/js/unmixed_spots.js

    let currentPairIndex = 0;
    let currentSampleSize = parseInt(sampleSizeInput.value) || 10000;
    let highlightReassigned = false;
    let highlightRemoved = false;


The default value for displayChanMode has changed from 'unmixed' to 'mixed' without explanation. This is a behavior change that affects the initial view. Consider documenting why this default was changed or ensure it aligns with expected user experience.

Suggested change

let highlightRemoved = false;

let highlightRemoved = false;

// Changed default value from 'unmixed' to 'mixed' to improve initial user experience.

// The 'mixed' mode provides a more informative overview for most users based on feedback.

// If you need the previous behavior, set displayChanMode to 'unmixed' here.

Copilot · 2025-11-14T21:29:57Z

src/see_spot/static/js/unmixed_spots.js

-
+            // No truncation - show full dataset name
+            // Format date to show only date (YYYY-MM-DD) without time
+            const dateOnly = dataset.creation_date.split(' ')[0];


The date splitting logic assumes the date format contains a space separator. If creation_date doesn't contain a space, this will return the entire string. Consider using a more robust date parsing approach or add validation to handle unexpected formats.

Suggested change

const dateOnly = dataset.creation_date.split(' ')[0];

// Robustly extract YYYY-MM-DD from creation_date

function formatDate(dateString) {

const d = new Date(dateString);

if (isNaN(d.getTime())) {

// Invalid date, fallback to original string

return dateString;

}

// Pad month and day with leading zeros

const year = d.getFullYear();

const month = String(d.getMonth() + 1).padStart(2, '0');

const day = String(d.getDate()).padStart(2, '0');

return `${year}-${month}-${day}`;

}

const dateOnly = formatDate(dataset.creation_date);

Copilot · 2025-11-14T21:29:58Z

src/see_spot/static/js/unmixed_spots.js

+        if (rValueSlider) {
+            rValueSlider.destroy();
+        }
+        if (distanceSlider) {
+            distanceSlider.destroy();
+        }
+        if (markerSizeSlider) {
+            markerSizeSlider.destroy();
+        }


[nitpick] The slider destruction code repeats the same pattern for three sliders (rValueSlider, distanceSlider, markerSizeSlider). Consider extracting this into a helper function to reduce code duplication.

Suggested change

if (rValueSlider) {

rValueSlider.destroy();

}

if (distanceSlider) {

distanceSlider.destroy();

}

if (markerSizeSlider) {

markerSizeSlider.destroy();

}

function destroySlider(slider) {

if (slider) {

slider.destroy();

}

}

destroySlider(rValueSlider);

destroySlider(distanceSlider);

destroySlider(markerSizeSlider);

Copilot · 2025-11-14T21:29:58Z

src/see_spot/app.py

+            logger.warning("spot_id column has NaNs before coercion; keys may be inconsistent")
+        logger.info("Coercing float spot_id column to int64 for clean spot_details keys")
+        try:
+            plot_df['spot_id'] = plot_df['spot_id'].astype('int64')
+        except Exception as e:
+            logger.error(f"Failed coercing spot_id to int64: {e}")


The code attempts to coerce float spot_ids with potential NaN values to int64. Converting NaN to int will raise a ValueError. The code should either filter out NaN values before coercion or handle them separately to avoid the exception being caught silently.

Suggested change

logger.warning("spot_id column has NaNs before coercion; keys may be inconsistent")

logger.info("Coercing float spot_id column to int64 for clean spot_details keys")

try:

plot_df['spot_id'] = plot_df['spot_id'].astype('int64')

except Exception as e:

logger.error(f"Failed coercing spot_id to int64: {e}")

logger.warning("spot_id column has NaNs before coercion; dropping rows with NaN spot_id")

plot_df = plot_df.drop_nulls(subset=['spot_id'])

logger.info("Coercing float spot_id column to int64 for clean spot_details keys")

plot_df['spot_id'] = plot_df['spot_id'].astype('int64')

Copilot · 2025-11-14T21:29:59Z

src/see_spot/app.py

+    logger.info("Constructed Neuroglancer JSON path: %s", ng_json_path)
+
+    # Check existence of JSON on S3 (metadata only) for better diagnostics
+    json_metadata = None


[nitpick] The variable json_metadata is initialized to None but is used later without checking if it was successfully set. Consider moving the initialization closer to where it's assigned to improve readability and reduce the scope of the variable.

Suggested change

json_metadata = None

Copilot · 2025-11-14T21:29:59Z

src/see_spot/ng_utils.py

+    is_s3 = isinstance(ng_json_path, str) and ng_json_path.startswith("s3://")
+    json_path_str = ng_json_path if is_s3 else str(Path(ng_json_path))


The comment on line 284 mentions 'avoid Path() on s3:// to prevent scheme collapse', but this logic only checks if it's a string starting with 's3://'. If ng_json_path is already a Path object with an s3:// scheme, this won't catch it. Consider adding a check for Path instances to be more robust.

Suggested change

is_s3 = isinstance(ng_json_path, str) and ng_json_path.startswith("s3://")

json_path_str = ng_json_path if is_s3 else str(Path(ng_json_path))

ng_json_path_str = str(ng_json_path)

is_s3 = ng_json_path_str.startswith("s3://")

json_path_str = ng_json_path_str if is_s3 else str(Path(ng_json_path))

Copilot · 2025-11-14T21:29:59Z

tests/test_dataset_loading.py

+)
+from see_spot.s3_handler import s3_handler  # noqa: E402
+
+


[nitpick] The hardcoded dataset name in the test uses a future date (2025-10-02). According to the PR context, the current date is November 2025, which means this test references a dataset from the past. Consider using a more recent dataset or adding a comment explaining why this specific historical dataset is used.

Suggested change

# This specific historical dataset is used for reproducibility and stability in integration tests.

Copilot · 2025-11-14T21:30:00Z

src/see_spot/static/js/unmixed_spots.js

+        //nameSpan.textContent = formattedName;
+        nameSpan.textContent = datasetName; // Show full name, not formatted MJD


[nitpick] The commented-out code and comment reference 'MJD' (possibly initials). This suggests incomplete refactoring. Either remove the commented line or document why both options are preserved.

Suggested change

//nameSpan.textContent = formattedName;

nameSpan.textContent = datasetName; // Show full name, not formatted MJD

nameSpan.textContent = datasetName;

Copilot · 2025-11-14T21:30:00Z

src/see_spot/s3_utils.py

+            # If there's a slash in the relative path, it's in a subdirectory
+            if '/' in relative_path.lstrip('/'):


The logic skips files in subdirectories, but the comment and PR description indicate support for tiled datasets where spots files are in subdirectories like Tile_X_0001_Y_0000_Z_0000/. This filtering will prevent finding spots files in tile folders. The find_mixed_spots_file and find_unmixed_spots_file functions should only skip subdirectories when NOT dealing with tile folders.

Suggested change

# If there's a slash in the relative path, it's in a subdirectory

if '/' in relative_path.lstrip('/'):

# Only skip subdirectories if NOT dealing with tile folders

# If pattern or prefix indicates tile folders, allow subdirectories

is_tile_search = "Tile_" in pattern or "Tile_" in prefix

if not is_tile_search and '/' in relative_path.lstrip('/'):

Carson Berry added 2 commits November 6, 2025 10:09

feat: load processing_manifest.json from main folder if necessary

a38cb2d

added pointer to new neuroglancer file (stitched)

1f52f60

cbrrrry requested a review from mattjdavis November 6, 2025 22:40

Carson Berry and others added 17 commits November 7, 2025 11:09

refactor: linting etc, + fixing zag's mess

903c6bd

refactor: cleanup

8c6aab5

feat: add support for -1 round in base_pattern

ab94721

bugfix: change mixed pattern

390d67f

feat: only search image_spot_spectral_unmixing top level

adfa208

fix: logic in create_neuroglancer_link()

44ef2d6

add some logging to ng function in app.py

30f8a59

try except for ng mode creation instead of using filename

272182e

feat: fixes for single channel

54b0ed7

feat: DataTables

9ebe516

feat: range sliders for metrics

e94a3ba

feat: add pin to clicked spots for NG mode

3b8b025

feat: support single tile loading

a9e7f47

feat: plot dye lines

670f377

fix: duplicate tile dataset names

e25a7f0

feat: multi annotate neuroglancer from table

feab3fa

fix: table display limits

d37e0e2

mattjdavis approved these changes Nov 14, 2025

View reviewed changes

mattjdavis requested a review from Copilot November 14, 2025 21:27

Copilot AI reviewed Nov 14, 2025

View reviewed changes

fix: install script

9185ae2

mattjdavis merged commit 632d48c into main Nov 14, 2025
0 of 3 checks passed

-    let highlightRemoved = false;
+    let highlightRemoved = false;
+    // Changed default value from 'unmixed' to 'mixed' to improve initial user experience.
+    // The 'mixed' mode provides a more informative overview for most users based on feedback.
+    // If you need the previous behavior, set displayChanMode to 'unmixed' here.

-            const dateOnly = dataset.creation_date.split(' ')[0];
+            // Robustly extract YYYY-MM-DD from creation_date
+            function formatDate(dateString) {
+                const d = new Date(dateString);
+                if (isNaN(d.getTime())) {
+                    // Invalid date, fallback to original string
+                    return dateString;
+                }
+                // Pad month and day with leading zeros
+                const year = d.getFullYear();
+                const month = String(d.getMonth() + 1).padStart(2, '0');
+                const day = String(d.getDate()).padStart(2, '0');
+                return `${year}-${month}-${day}`;
+            }
+            const dateOnly = formatDate(dataset.creation_date);

		is_s3 = isinstance(ng_json_path, str) and ng_json_path.startswith("s3://")
		json_path_str = ng_json_path if is_s3 else str(Path(ng_json_path))

-    is_s3 = isinstance(ng_json_path, str) and ng_json_path.startswith("s3://")
-    json_path_str = ng_json_path if is_s3 else str(Path(ng_json_path))
+    ng_json_path_str = str(ng_json_path)
+    is_s3 = ng_json_path_str.startswith("s3://")
+    json_path_str = ng_json_path_str if is_s3 else str(Path(ng_json_path))



	# This specific historical dataset is used for reproducibility and stability in integration tests.

		//nameSpan.textContent = formattedName;
		nameSpan.textContent = datasetName; // Show full name, not formatted MJD

	//nameSpan.textContent = formattedName;
	nameSpan.textContent = datasetName; // Show full name, not formatted MJD
	nameSpan.textContent = datasetName;

		# If there's a slash in the relative path, it's in a subdirectory
		if '/' in relative_path.lstrip('/'):

-            # If there's a slash in the relative path, it's in a subdirectory
-            if '/' in relative_path.lstrip('/'):
+            # Only skip subdirectories if NOT dealing with tile folders
+            # If pattern or prefix indicates tile folders, allow subdirectories
+            is_tile_search = "Tile_" in pattern or "Tile_" in prefix
+            if not is_tile_search and '/' in relative_path.lstrip('/'):

Feat support independent tile unmixing #10

Feat support independent tile unmixing #10

Uh oh!

Conversation

cbrrrry commented Nov 6, 2025

Uh oh!

mattjdavis left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants