-
Notifications
You must be signed in to change notification settings - Fork 31
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
I found an livepeer/ai-runner:live-app-streamdiffusion-sdxl container that was running for several hours and printing the following logs:
Traceback (most recent call last):
File "/app/app/live/process/process_guardian.py", line 271, in _monitor_loop
last_error = self.process.get_last_error()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/app/live/process/process.py", line 449, in get_last_error
last_error = self.error_queue.get_nowait()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/miniconda3/envs/comfystream/lib/python3.11/multiprocessing/queues.py", line 135, in get_nowait
return self.get(False)
^^^^^^^^^^^^^^^
File "/workspace/miniconda3/envs/comfystream/lib/python3.11/multiprocessing/queues.py", line 100, in get
raise ValueError(f"Queue {self!r} is closed")
ValueError: Queue <multiprocessing.queues.Queue object at 0x75bb4b35ca10> is closed
Stack (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/app/app/live/infer.py", line 227, in <module>
asyncio.run(
File "/workspace/miniconda3/envs/comfystream/lib/python3.11/asyncio/runners.py", line 190, in run
return runner.run(main)
File "/workspace/miniconda3/envs/comfystream/lib/python3.11/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
File "/workspace/miniconda3/envs/comfystream/lib/python3.11/asyncio/base_events.py", line 641, in run_until_complete
self.run_forever()
File "/workspace/miniconda3/envs/comfystream/lib/python3.11/asyncio/base_events.py", line 608, in run_forever
self._run_once()
File "/workspace/miniconda3/envs/comfystream/lib/python3.11/asyncio/base_events.py", line 1936, in _run_once
handle._run()
File "/workspace/miniconda3/envs/comfystream/lib/python3.11/asyncio/events.py", line 84, in _run
self._context.run(self._callback, *self._args)
File "/app/app/live/process/process_guardian.py", line 340, in _monitor_loop
logging.exception("Error in monitor loop", stack_info=True)
timestamp=2025-11-02 04:23:35 level=ERROR location=process_guardian.py:340:_monitor_loop gateway_request_id=43f4f6e9 manifest_id=ef783006 stream_id=aiJobTesterStream-1762013066117295814 message=Error in monitor loop
Reproduction steps
Running go-livepeer v0.8.8 with livepeer/ai-runner:live-app-streamdiffusion-sdxl on dual 4090 system.
Docker inspect:
"Image": "sha256:f51676ce8332dbad414b9e3daa66a5f5a797e6d54682601d8d910c151a5e1748",
"Image": "livepeer/ai-runner:live-app-streamdiffusion-sdxl",
Commit: a201f99
Expected behaviour
_monitor_loop should not fail and /health endpoint should have reported the error state to allow container restart
Severity
None
Screenshots / Live demo link
No response
OS
None
Running on
None
AI-worker version
No response
Additional context
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working