Skip to content

Conversation

@fstagni
Copy link
Contributor

@fstagni fstagni commented Oct 29, 2025

This is a rewrite of few functions in subprocess that need very careful evaluation.

For reasons not understood (at least not by me) the logic of killing a pid and its sub-processes is not completing (not always, at least). I blame the very old logic used there, but of course I might be wrong. In any case this is a rewriting using psutil.

I had to change one test in Test_JobWrapper. I have not fully understood why, but I do not see how the previous code could work. cc @aldbr

BEGINRELEASENOTES

CHANGE: Subprocess: use psutil for killing processes

ENDRELEASENOTES

@fstagni fstagni force-pushed the 90_use_psutil branch 2 times, most recently from ec23535 to 72f644d Compare October 30, 2025 16:59
for p in children:
g, a = self.__terminateProcess(p)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually here you could directly call psutil.wait_procs as it was done before, no?
This is shown in the psutil documentation (https://psutil.readthedocs.io/en/latest/index.html#psutil.wait_procs) and this was done like this before (I think the else block was correct but it was just not used by the JobWrapper.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well actually, forget about my other comment: any reason for not applying this comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 2 are not equivalent?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I should have provided further details.
Here you sequentially terminate and wait for each process to terminate. If there are n processes, and if they do not terminate, you are going to wait for n * 60s from what I understand.

Whereas you could reduce the waiting time from o(n) to o(1) by doing:

for p in children:
    p.terminate()
psutil.wait_procs(children, timeout=60)

This is what it's currently done and what is recommended in the documentation.
(The problem with the current code is the first bloc after the if pgid != os.getpgrp():, because we terminate the processes but we don't kill them I think, the else bloc looks fine to me).

@fstagni fstagni merged commit eb57818 into DIRACGrid:integration Nov 19, 2025
21 checks passed
@DIRACGridBot DIRACGridBot added the sweep:ignore Prevent sweeping from being ran for this PR label Nov 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

sweep:ignore Prevent sweeping from being ran for this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants