Skip to content

rework of original background substraction#24

Merged
kbestak merged 14 commits intoSchapiroLabor:devfrom
VictorDidier:main
Oct 10, 2025
Merged

rework of original background substraction#24
kbestak merged 14 commits intoSchapiroLabor:devfrom
VictorDidier:main

Conversation

@VictorDidier
Copy link
Contributor

Added the rework of the tool in a folder called backsub:
New features:

  1. Tool divided into multiple scripts (cli, main, ome metadata handling).
  2. Optimized RAM? by reading one channel at once.
  3. Added support for input image not being a pyramidal tif.
  4. Palom library is not used anymore for writing the pyramid, instead pyramid_gaussian from skimage is used.

!!! No changes were made to the original script background_subtraction.py, all the features above are in the added backsub folder.!!!


outdir.mkdir(parents=True, exist_ok=True)
#out_file_path= outdir / f'{file_name}.tif'
out_file_path=outdir / file_name
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in main the outpath is given as an input for this function which treats it as the outdir - it should just be treated as the outpath here


#Write updated markers.csv
markers_updated = markers_updated.drop(columns=['keep','ind','processed','factor','bg_idx'])
markers_updated .to_csv(args.markerout / "markers_bs.csv", index=False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
markers_updated .to_csv(args.markerout / "markers_bs.csv", index=False)
markers_updated.to_csv(args.markerout, index=False)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the correction suggested above. The latest commit (added saveRAM argument) includes the following features:

Commit features:

  1. Markerout corrected to be a path to output file (CLIand backsub.py) line 77 and 252 respectively
  2. Added the saving ram option
  3. Added dask-image to new_environment.yml required for saving ram

Issued solved:
4. (#23) Read channel name from markers csv and add it to the ome metadata
5. (#18) Sends warning with repeated backsub column name (line 56 backsub.py)
6. (#22) Apply lossless compression to the out tif to reduce output file size
7. (#15) Numpy version is updated, no more limited to <1.22

@VictorDidier
Copy link
Contributor Author

Hello Mr. Bestak, I think this is the final outcome of the rework, I optimized the RAM usage as to have it in the range of values you had in version 0.4.1. Nonetheless, I have set two RAM profiles, one with moderate RAM usage, this is the default. And a second one with similar performance as in v0.4.1, this one is implemented when using the --save_ram argument in the CLI.

I have also updated the README.md file accordingly and listed the features of this new version. I think the only missing part is updating the Dockerfile to include the libraries you require for Nextflow. Can you please do that.

@kbestak
Copy link
Collaborator

kbestak commented Oct 9, 2025

Amazing work @VictorDidier !

Dockerfile:

FROM mambaorg/micromamba:1.5.10-noble

# Copy conda environment file
COPY --chown=$MAMBA_USER:$MAMBA_USER ./backsub/new_env.yml /tmp/conda.yml

# Install environment
RUN micromamba install -y -n base -f /tmp/conda.yml \
    && micromamba install -y -n base conda-forge::procps-ng \
    && micromamba env export --name base --explicit > environment.lock \
    && echo ">> CONDA_LOCK_START" \
    && cat environment.lock \
    && echo "<< CONDA_LOCK_END" \
    && micromamba clean -a -y

# Switch to root to copy everything
USER root

# Ensure micromamba binaries are in PATH
ENV PATH="$MAMBA_ROOT_PREFIX/bin:$PATH"

# Copy the rest of the current directory into /app inside the container
WORKDIR /app
COPY . .

in new_env.yaml add the extra = to zarr as here:

  - pip:
    - "zarr==3.1.1"

@kbestak
Copy link
Collaborator

kbestak commented Oct 9, 2025

I think compression should be provided as an argument, with "LZW" as default, what do you think?

@VictorDidier
Copy link
Contributor Author

I think compression should be provided as an argument, with "LZW" as default, what do you think?

In the way we are writing the output images, i.e. channel by channel, the "LZW" algorithm is recommendable for lossless compression. The other lossless option is jpeg2000 but this works better when writing a whole stack at once or RGB (Check this link: https://forum.image.sc/t/creating-a-multi-channel-pyramid-ome-tiff-with-tiffwriter-in-python/76424/6). But since we are not saving the whole-stack at once, "LZW" compression becomes the best option.

If compression becomes an argument, I think we could define it as a boolean flag, i.e. -nc (--no_compression) and mention that by default "LZW" is applied unless -nc is given. What do you think?

@VictorDidier VictorDidier changed the base branch from main to dev October 10, 2025 09:33
@kbestak kbestak merged commit a222890 into SchapiroLabor:dev Oct 10, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

read channel name from markers.csv and add it to ome.tiff file size are double as registration Numpy version has to be below 1.24

2 participants