streamdiffusion sdxl - Error in monitor loop (stuck worker)

### Describe the bug

I found an `livepeer/ai-runner:live-app-streamdiffusion-sdxl` container that was running for several hours and printing the following logs:

```
Traceback (most recent call last):
  File "/app/app/live/process/process_guardian.py", line 271, in _monitor_loop
    last_error = self.process.get_last_error()
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/app/live/process/process.py", line 449, in get_last_error
    last_error = self.error_queue.get_nowait()
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/miniconda3/envs/comfystream/lib/python3.11/multiprocessing/queues.py", line 135, in get_nowait
    return self.get(False)
           ^^^^^^^^^^^^^^^
  File "/workspace/miniconda3/envs/comfystream/lib/python3.11/multiprocessing/queues.py", line 100, in get
    raise ValueError(f"Queue {self!r} is closed")
ValueError: Queue <multiprocessing.queues.Queue object at 0x75bb4b35ca10> is closed
Stack (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/app/app/live/infer.py", line 227, in <module>
    asyncio.run(
  File "/workspace/miniconda3/envs/comfystream/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
  File "/workspace/miniconda3/envs/comfystream/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
  File "/workspace/miniconda3/envs/comfystream/lib/python3.11/asyncio/base_events.py", line 641, in run_until_complete
    self.run_forever()
  File "/workspace/miniconda3/envs/comfystream/lib/python3.11/asyncio/base_events.py", line 608, in run_forever
    self._run_once()
  File "/workspace/miniconda3/envs/comfystream/lib/python3.11/asyncio/base_events.py", line 1936, in _run_once
    handle._run()
  File "/workspace/miniconda3/envs/comfystream/lib/python3.11/asyncio/events.py", line 84, in _run
    self._context.run(self._callback, *self._args)
  File "/app/app/live/process/process_guardian.py", line 340, in _monitor_loop
    logging.exception("Error in monitor loop", stack_info=True)
timestamp=2025-11-02 04:23:35 level=ERROR location=process_guardian.py:340:_monitor_loop gateway_request_id=43f4f6e9 manifest_id=ef783006 stream_id=aiJobTesterStream-1762013066117295814 message=Error in monitor loop
```

### Reproduction steps

Running go-livepeer v0.8.8 with `livepeer/ai-runner:live-app-streamdiffusion-sdxl` on dual 4090 system.

Docker inspect:
"Image": "sha256:f51676ce8332dbad414b9e3daa66a5f5a797e6d54682601d8d910c151a5e1748",
            "Image": "livepeer/ai-runner:live-app-streamdiffusion-sdxl",

Commit: https://github.com/livepeer/ai-runner/commit/a201f99d9965c279d722eb38ef3abee76c3bc352


### Expected behaviour

`_monitor_loop` should not fail and `/health` endpoint should have reported the error state to allow container restart

### Severity

None

### Screenshots / Live demo link

_No response_

### OS

None

### Running on

None

### AI-worker version

_No response_

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

streamdiffusion sdxl - Error in monitor loop (stuck worker) #835

Describe the bug

Reproduction steps

Expected behaviour

Severity

Screenshots / Live demo link

OS

Running on

AI-worker version

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

streamdiffusion sdxl - Error in monitor loop (stuck worker) #835

Description

Describe the bug

Reproduction steps

Expected behaviour

Severity

Screenshots / Live demo link

OS

Running on

AI-worker version

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions