-
-
Couldn't load subscription status.
- Fork 33.3k
gh-116738: Use PyMutex for lzma module #140711
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| def worker(): | ||
| # it should return empty bytes as it buffers data internally | ||
| data = lzc.compress(INPUT) | ||
| self.assertEqual(data, b"") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assertion self.assertEqual(data, b"") is flaky. In free-threaded mode, compress() may return data chunks non-deterministically due to race conditions in internal buffering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ashm-dev Thanks for your comment. I’m trying to verify/test the mutex is protecting the internal state and buffering, so there shouldn’t be a race condition. Could you please explain which race condition you mean? That would help me understand your point better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ashm-dev Are you using ChatGPT or another LLM to review for you? If so, please don't -- it's not helpful. If not, please try to be clearer in your responses.
| def worker(): | ||
| data = lzd.decompress(compressed, chunk_size) | ||
| self.assertEqual(len(data), chunk_size) | ||
| output.append(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
output.append(data) without synchronization causes race conditions in free-threaded mode, potentially losing data or corrupting the list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
output.append(data)without synchronization causes race conditions in free-threaded mode, potentially losing data or corrupting the list.
@ashm-dev list is thread safe in free-threaded build.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the free-threaded build, list operations use internal locks to avoid crashes, but thread safety isn’t guaranteed for concurrent mutations — see Python free-threading HOWTO.
|
|
||
| def worker(): | ||
| data = lzd.decompress(compressed, chunk_size) | ||
| self.assertEqual(len(data), chunk_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.assertEqual(len(data), chunk_size) is wrong. decompress() may return less than max_length bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ashm-dev I agree that decompress() can return less than max_length if there isn’t enough input. In this test, I’m providing input that should produce at least max_length bytes. Is there anything else I might be missing? If I give enough valid input, is there any reason why lzma wouldn’t return max_length?
There are other tests making similar assumptions.
Lines 164 to 169 in ce4b0ed
| # Feed first half the input | |
| len_ = len(COMPRESSED_XZ) // 2 | |
| out.append(lzd.decompress(COMPRESSED_XZ[:len_], | |
| max_length=max_length)) | |
| self.assertFalse(lzd.needs_input) | |
| self.assertEqual(len(out[-1]), max_length) |
Similar to #140555, the main goal was to review the
lzmamodule for free-threading. The methods already use a lock, which makes them thread-safe in a free-threaded build. I replacedPyThread_acquire_lockwithPyMutex.PyMutexreleases the GIL when the thread is parked. This change removes some macros and allocation handling code.cc: @mpage @colesbury @emmatyping