Skip to content

Conversation

@cbrrrry
Copy link
Member

@cbrrrry cbrrrry commented Nov 6, 2025

Summary
This PR adds support for datasets with independent tile unmixing workflows by implementing flexible configuration loading and a new Neuroglancer integration method.

Key Changes

  1. Flexible Processing Manifest Discovery
  • Added find_processing_manifest() helper function that checks for processing_manifest.json in both top-level ({dataset}/) and derived folder ({dataset}/derived/) locations
  • Updated all manifest loading logic in /api/real_spots_data, /api/datasets/download, and caching to use the new discovery method
  • Provides better error messages showing all paths checked when manifest not found
  1. Dataset-Specific Neuroglancer Integration
  • Added create_link_from_json() function in ng_utils.py that generates Neuroglancer links from pre-existing JSON state files
  • Supports both local file paths and S3 URIs for JSON state files
  • Intelligently updates position and annotation layers while preserving all other viewer settings
  1. Automatic Method Selection for Neuroglancer Links
  • Modified /api/create-neuroglancer-link endpoint to automatically detect dataset type
  • When "merged" appears in pkl filenames (e.g., mixed_spots_R2_merged.pkl), uses the new JSON-based method with phase_correlation_stitching_neuroglancer.json
  • Falls back to traditional fused-path method for standard datasets
  • Caches unmixed spots filename for efficient detection

Backward Compatibility

  • All changes are fully backward compatible with existing datasets
  • Standard HCR datasets continue to work with the original workflow
  • New features activate automatically based on dataset structure

@cbrrrry cbrrrry requested a review from mattjdavis November 6, 2025 22:40
Copy link
Collaborator

@mattjdavis mattjdavis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we changed alot good work team

@mattjdavis mattjdavis requested a review from Copilot November 14, 2025 21:27
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements flexible dataset processing to support both standard fused datasets and new tiled datasets with independent unmixing workflows. The key changes enable automatic detection of dataset structure, dynamic Neuroglancer integration based on available JSON configuration files, and creation of virtual dataset entries for individual tiles.

Key Changes:

  • Added flexible processing manifest discovery that checks both top-level and derived folder locations
  • Implemented dataset-specific Neuroglancer integration using pre-existing JSON state files with automatic fallback to the original fused-path method
  • Added automatic tile detection with virtual dataset creation for independent tile visualization

Reviewed Changes

Copilot reviewed 11 out of 12 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
tests/test_dataset_loading.py Integration tests validating manifest discovery, dataframe merging, and S3 accessibility for problematic datasets
test_tile_detection.py Standalone test script for tile suffix extraction and detection functionality
src/see_spot/templates/unmixed_spots.html Major UI enhancements: collapsible sections, dataset title display, filter sliders, dye line toggle, and improved table layout
src/see_spot/static/js/unmixed_spots.js Frontend logic for new filtering, dye line rendering, multi-annotation support, and dataset title management
src/see_spot/s3_utils.py Core utilities for tile detection, manifest discovery, and tile-aware data loading
src/see_spot/s3_handler.py Code formatting improvements (no functional changes)
src/see_spot/ng_utils.py New create_link_from_json() function and multi-annotation support for Neuroglancer integration
src/see_spot/logging_config.py Centralized logging configuration module
src/see_spot/app.py Backend API updates for tile detection, virtual dataset creation, manifest discovery, and enhanced Neuroglancer link generation
dye-lines-clipping.md Documentation of dye line rendering approach and clipping strategy
README.md Updated documentation covering tiled dataset structure and virtual dataset naming conventions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

for part in parts:
if part.startswith('R') and part[1:].isdigit():
# add support for R-1 (default round for datasets without metadata)
if part.startswith("R") and (part[1:].isdigit() or part[1:] == '-1'):
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'recieve' to 'receive'.

Copilot uses AI. Check for mistakes.
let currentPairIndex = 0;
let currentSampleSize = parseInt(sampleSizeInput.value) || 10000;
let highlightReassigned = false;
let highlightRemoved = false;
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value for displayChanMode has changed from 'unmixed' to 'mixed' without explanation. This is a behavior change that affects the initial view. Consider documenting why this default was changed or ensure it aligns with expected user experience.

Suggested change
let highlightRemoved = false;
let highlightRemoved = false;
// Changed default value from 'unmixed' to 'mixed' to improve initial user experience.
// The 'mixed' mode provides a more informative overview for most users based on feedback.
// If you need the previous behavior, set displayChanMode to 'unmixed' here.

Copilot uses AI. Check for mistakes.

// No truncation - show full dataset name
// Format date to show only date (YYYY-MM-DD) without time
const dateOnly = dataset.creation_date.split(' ')[0];
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The date splitting logic assumes the date format contains a space separator. If creation_date doesn't contain a space, this will return the entire string. Consider using a more robust date parsing approach or add validation to handle unexpected formats.

Suggested change
const dateOnly = dataset.creation_date.split(' ')[0];
// Robustly extract YYYY-MM-DD from creation_date
function formatDate(dateString) {
const d = new Date(dateString);
if (isNaN(d.getTime())) {
// Invalid date, fallback to original string
return dateString;
}
// Pad month and day with leading zeros
const year = d.getFullYear();
const month = String(d.getMonth() + 1).padStart(2, '0');
const day = String(d.getDate()).padStart(2, '0');
return `${year}-${month}-${day}`;
}
const dateOnly = formatDate(dataset.creation_date);

Copilot uses AI. Check for mistakes.
Comment on lines +471 to +479
if (rValueSlider) {
rValueSlider.destroy();
}
if (distanceSlider) {
distanceSlider.destroy();
}
if (markerSizeSlider) {
markerSizeSlider.destroy();
}
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The slider destruction code repeats the same pattern for three sliders (rValueSlider, distanceSlider, markerSizeSlider). Consider extracting this into a helper function to reduce code duplication.

Suggested change
if (rValueSlider) {
rValueSlider.destroy();
}
if (distanceSlider) {
distanceSlider.destroy();
}
if (markerSizeSlider) {
markerSizeSlider.destroy();
}
function destroySlider(slider) {
if (slider) {
slider.destroy();
}
}
destroySlider(rValueSlider);
destroySlider(distanceSlider);
destroySlider(markerSizeSlider);

Copilot uses AI. Check for mistakes.
Comment on lines +458 to +463
logger.warning("spot_id column has NaNs before coercion; keys may be inconsistent")
logger.info("Coercing float spot_id column to int64 for clean spot_details keys")
try:
plot_df['spot_id'] = plot_df['spot_id'].astype('int64')
except Exception as e:
logger.error(f"Failed coercing spot_id to int64: {e}")
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code attempts to coerce float spot_ids with potential NaN values to int64. Converting NaN to int will raise a ValueError. The code should either filter out NaN values before coercion or handle them separately to avoid the exception being caught silently.

Suggested change
logger.warning("spot_id column has NaNs before coercion; keys may be inconsistent")
logger.info("Coercing float spot_id column to int64 for clean spot_details keys")
try:
plot_df['spot_id'] = plot_df['spot_id'].astype('int64')
except Exception as e:
logger.error(f"Failed coercing spot_id to int64: {e}")
logger.warning("spot_id column has NaNs before coercion; dropping rows with NaN spot_id")
plot_df = plot_df.drop_nulls(subset=['spot_id'])
logger.info("Coercing float spot_id column to int64 for clean spot_details keys")
plot_df['spot_id'] = plot_df['spot_id'].astype('int64')

Copilot uses AI. Check for mistakes.
logger.info("Constructed Neuroglancer JSON path: %s", ng_json_path)

# Check existence of JSON on S3 (metadata only) for better diagnostics
json_metadata = None
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The variable json_metadata is initialized to None but is used later without checking if it was successfully set. Consider moving the initialization closer to where it's assigned to improve readability and reduce the scope of the variable.

Suggested change
json_metadata = None

Copilot uses AI. Check for mistakes.
Comment on lines +285 to +286
is_s3 = isinstance(ng_json_path, str) and ng_json_path.startswith("s3://")
json_path_str = ng_json_path if is_s3 else str(Path(ng_json_path))
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment on line 284 mentions 'avoid Path() on s3:// to prevent scheme collapse', but this logic only checks if it's a string starting with 's3://'. If ng_json_path is already a Path object with an s3:// scheme, this won't catch it. Consider adding a check for Path instances to be more robust.

Suggested change
is_s3 = isinstance(ng_json_path, str) and ng_json_path.startswith("s3://")
json_path_str = ng_json_path if is_s3 else str(Path(ng_json_path))
ng_json_path_str = str(ng_json_path)
is_s3 = ng_json_path_str.startswith("s3://")
json_path_str = ng_json_path_str if is_s3 else str(Path(ng_json_path))

Copilot uses AI. Check for mistakes.
)
from see_spot.s3_handler import s3_handler # noqa: E402


Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The hardcoded dataset name in the test uses a future date (2025-10-02). According to the PR context, the current date is November 2025, which means this test references a dataset from the past. Consider using a more recent dataset or adding a comment explaining why this specific historical dataset is used.

Suggested change
# This specific historical dataset is used for reproducibility and stability in integration tests.

Copilot uses AI. Check for mistakes.
Comment on lines +291 to +292
//nameSpan.textContent = formattedName;
nameSpan.textContent = datasetName; // Show full name, not formatted MJD
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The commented-out code and comment reference 'MJD' (possibly initials). This suggests incomplete refactoring. Either remove the commented line or document why both options are preserved.

Suggested change
//nameSpan.textContent = formattedName;
nameSpan.textContent = datasetName; // Show full name, not formatted MJD
nameSpan.textContent = datasetName;

Copilot uses AI. Check for mistakes.
Comment on lines +251 to +252
# If there's a slash in the relative path, it's in a subdirectory
if '/' in relative_path.lstrip('/'):
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic skips files in subdirectories, but the comment and PR description indicate support for tiled datasets where spots files are in subdirectories like Tile_X_0001_Y_0000_Z_0000/. This filtering will prevent finding spots files in tile folders. The find_mixed_spots_file and find_unmixed_spots_file functions should only skip subdirectories when NOT dealing with tile folders.

Suggested change
# If there's a slash in the relative path, it's in a subdirectory
if '/' in relative_path.lstrip('/'):
# Only skip subdirectories if NOT dealing with tile folders
# If pattern or prefix indicates tile folders, allow subdirectories
is_tile_search = "Tile_" in pattern or "Tile_" in prefix
if not is_tile_search and '/' in relative_path.lstrip('/'):

Copilot uses AI. Check for mistakes.
@mattjdavis mattjdavis merged commit 632d48c into main Nov 14, 2025
0 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants