Skip to content

Conversation

@yoney
Copy link
Contributor

@yoney yoney commented Oct 28, 2025

Similar to #140555, the main goal was to review the lzma module for free-threading. The methods already use a lock, which makes them thread-safe in a free-threaded build. I replaced PyThread_acquire_lock with PyMutex. PyMutex releases the GIL when the thread is parked. This change removes some macros and allocation handling code.

cc: @mpage @colesbury @emmatyping

@yoney yoney marked this pull request as ready for review October 28, 2025 16:39
def worker():
# it should return empty bytes as it buffers data internally
data = lzc.compress(INPUT)
self.assertEqual(data, b"")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assertion self.assertEqual(data, b"") is flaky. In free-threaded mode, compress() may return data chunks non-deterministically due to race conditions in internal buffering.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ashm-dev Thanks for your comment. I’m trying to verify/test the mutex is protecting the internal state and buffering, so there shouldn’t be a race condition. Could you please explain which race condition you mean? That would help me understand your point better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ashm-dev Are you using ChatGPT or another LLM to review for you? If so, please don't -- it's not helpful. If not, please try to be clearer in your responses.

def worker():
data = lzd.decompress(compressed, chunk_size)
self.assertEqual(len(data), chunk_size)
output.append(data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

output.append(data) without synchronization causes race conditions in free-threaded mode, potentially losing data or corrupting the list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

output.append(data) without synchronization causes race conditions in free-threaded mode, potentially losing data or corrupting the list.

@ashm-dev list is thread safe in free-threaded build.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the free-threaded build, list operations use internal locks to avoid crashes, but thread safety isn’t guaranteed for concurrent mutations — see Python free-threading HOWTO.


def worker():
data = lzd.decompress(compressed, chunk_size)
self.assertEqual(len(data), chunk_size)
Copy link
Contributor

@ashm-dev ashm-dev Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.assertEqual(len(data), chunk_size) is wrong. decompress() may return less than max_length bytes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ashm-dev I agree that decompress() can return less than max_length if there isn’t enough input. In this test, I’m providing input that should produce at least max_length bytes. Is there anything else I might be missing? If I give enough valid input, is there any reason why lzma wouldn’t return max_length?

There are other tests making similar assumptions.

# Feed first half the input
len_ = len(COMPRESSED_XZ) // 2
out.append(lzd.decompress(COMPRESSED_XZ[:len_],
max_length=max_length))
self.assertFalse(lzd.needs_input)
self.assertEqual(len(out[-1]), max_length)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants