Skip to content

Improve segmenter developer experience#644

Merged
sonnyp merged 5 commits intomainfrom
segmenter-dev
Jun 16, 2025
Merged

Improve segmenter developer experience#644
sonnyp merged 5 commits intomainfrom
segmenter-dev

Conversation

@sonnyp
Copy link
Collaborator

@sonnyp sonnyp commented Jun 12, 2025

Todo

  • Test the segmenter docker image (HOW?)
  • reconsider deploying the segmenter as a container

@sonnyp sonnyp requested a review from ethanjli June 12, 2025 13:03
@sonnyp
Copy link
Collaborator Author

sonnyp commented Jun 12, 2025

@ethanjli could you have a look at this? in particular the changes / comments on segmenter/apt-packages

Python dev tooling highlights potentially serious issues in the segmenter code, I don't feel comfortable yet fixing it myself. Who could have a look? See https://github.com/PlanktoScope/PlanktoScope/actions/runs/15611242646/job/43972432697?pr=644

@ethanjli
Copy link
Collaborator

serious issues in the segmenter code, I don't feel comfortable yet fixing it myself. Who could have a look? See https://github.com/PlanktoScope/PlanktoScope/actions/runs/15611242646/job/43972432697?pr=644

The one flagged error which is (very!) concerning to me is about ineffectual assignment to local variable flat. I wonder if the person who wrote this meant to assign to self.__flat instead of flat? If so, then the segmenter has been throwing away some rather CPU-expensive computation results instead of using those results as part of its processing pipeline. But I don't really know how the person who wrote the segmenter had intended for it to behave...

  • But the SegmenterProcess._pipe method's logic related to self.__flat is so convoluted that I expect that changing flat = to self.__flat = will cause the segmenter to behave totally differently, so that different versions of the segmenter would produce very different results given the same input datasets. Then it could be invalid to use EcoTaxa to make scientific comparisons on output results from datasets processed by old vs. new segmenter. I personally don't have enough domain knowledge to make a judgment call about which behavior would be better. Maybe one way forward could be to process a given input dataset twice (once before making the change, and once after making the change) and then ask @fabienlombard and @chevreuill3000 to judge the results.
  • Also, Thibaut has wanted to discard the entire segmenter codebase in favor of a total rewrite (see the 2024-03-14 software meeting).
  • Not coincidentally, the _pipe method is also the thing which Fix segmenter "flat calculation" trigger criteria #575 attempts to fix (according to Fix segmenter "flat calculation" trigger criteria PlanktoScope/device-backend#73, there may be some significant bugs in the current implementation of the _pipe method?).

For all other flagged unused local variables (quartiles, average, fps, e in exception handlers), I believe it should be fine to comment out (quartiles, average) or remove (fps, e) their respective lines of code: either the lines of code which use those variables (quartiles, average) also appear to be commented out, or they simply aren't referenced anywhere else (fps, e in exception handlers). Similarly, I trust the recommendation to remove unused import threading. My disclaimer is that my mental model is pretty shallow for those parts of the codebase.

@fabienlombard
Copy link
Collaborator

hello the initial prototype was done as a matlab code with way to proceed way more simple (in code) and way more efficient (into not doing mistake). for sake of communication, simplicity and communication I would recomand to go back to this kind of design to simplify life
segmentPlanktoscopenov2022simple.m.zip

@fabienlombard
Copy link
Collaborator

the only complicated thing in there is the background calculation where I use 4th dimension matrixes (images in 2d; 3 layers of colors + 5 images to do the median); all my trials and errors are in comments

@sonnyp sonnyp marked this pull request as ready for review June 16, 2025 07:45
@sonnyp sonnyp merged commit ec4b6d1 into main Jun 16, 2025
5 checks passed
@sonnyp sonnyp deleted the segmenter-dev branch June 16, 2025 09:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants