Skip to content

Losing redis connection #127

@renan-souza

Description

@renan-souza

Sometimes, fortunately only rarely with the LLM experiment, we get the error below. We need to debug it to plan what to do. One possibility is simply to retry the connection and the failed request until it makes it. Today, if this error happens, we are likely losing data.

[flowcept][ERROR][frontier06306.frontier.olcf.ornl.gov][pid=61095][thread=140733193385728][function=_start][Connection closed by server.]
Traceback (most recent call last):
File "/lustre/orion/stf219/scratch/souzar/flowcept/flowcept/flowceptor/consumers/document_inserter.py", line 199, in _start
for message in pubsub.listen():
File "/lustre/orion/stf219/scratch/souzar/miniconda/envs/llm3/lib/python3.8/site-packages/redis/client.py", line 1653, in listen
response = self.handle_message(self.parse_response(block=True))
File "/lustre/orion/stf219/scratch/souzar/miniconda/envs/llm3/lib/python3.8/site-packages/redis/client.py", line 1531, in parse_response
response = self._execute(conn, try_read)
File "/lustre/orion/stf219/scratch/souzar/miniconda/envs/llm3/lib/python3.8/site-packages/redis/client.py", line 1507, in _execute
return conn.retry.call_with_retry(
File "/lustre/orion/stf219/scratch/souzar/miniconda/envs/llm3/lib/python3.8/site-packages/redis/retry.py", line 49, in call_with_retry
fail(error)
File "/lustre/orion/stf219/scratch/souzar/miniconda/envs/llm3/lib/python3.8/site-packages/redis/client.py", line 1509, in
lambda error: self._disconnect_raise_connect(conn, error),
File "/lustre/orion/stf219/scratch/souzar/miniconda/envs/llm3/lib/python3.8/site-packages/redis/client.py", line 1496, in _disconnect_raise_connect
raise error
File "/lustre/orion/stf219/scratch/souzar/miniconda/envs/llm3/lib/python3.8/site-packages/redis/retry.py", line 46, in call_with_retry
return do()
File "/lustre/orion/stf219/scratch/souzar/miniconda/envs/llm3/lib/python3.8/site-packages/redis/client.py", line 1508, in
lambda: command(*args, **kwargs),
File "/lustre/orion/stf219/scratch/souzar/miniconda/envs/llm3/lib/python3.8/site-packages/redis/client.py", line 1529, in try_read
return conn.read_response()
File "/lustre/orion/stf219/scratch/souzar/miniconda/envs/llm3/lib/python3.8/site-packages/redis/connection.py", line 848, in read_response
response = self._parser.read_response(disable_decoding=disable_decoding)
File "/lustre/orion/stf219/scratch/souzar/miniconda/envs/llm3/lib/python3.8/site-packages/redis/connection.py", line 335, in read_response
result = self._read_response(disable_decoding=disable_decoding)
File "/lustre/orion/stf219/scratch/souzar/miniconda/envs/llm3/lib/python3.8/site-packages/redis/connection.py", line 383, in _read_response
response = [
File "/lustre/orion/stf219/scratch/souzar/miniconda/envs/llm3/lib/python3.8/site-packages/redis/connection.py", line 384, in
self._read_response(disable_decoding=disable_decoding)
File "/lustre/orion/stf219/scratch/souzar/miniconda/envs/llm3/lib/python3.8/site-packages/redis/connection.py", line 377, in _read_response
response = self._buffer.read(length)
File "/lustre/orion/stf219/scratch/souzar/miniconda/envs/llm3/lib/python3.8/site-packages/redis/connection.py", line 230, in read
self._read_from_socket(length - self.length)
File "/lustre/orion/stf219/scratch/souzar/miniconda/envs/llm3/lib/python3.8/site-packages/redis/connection.py", line 195, in _read_from_socket
raise ConnectionError(SERVER_CLOSED_CONNECTION_ERROR)
redis.exceptions.ConnectionError: Connection closed by server.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions