Skip to content

LZMA MultiThreading XZ compression support #114953

@mkomet

Description

@mkomet

Feature or enhancement

Proposal:

import lzma
data = b'8'*(2 << 30)
# Current API, using single-threaded pool in liblzma,
# using `lzma_easy_encoder` / `lzma_stream_encoder`
lzma.compress(data)
# Compress using XZ underlying 4 background threads, using `lzma_stream_encoder_mt`
lzma.compress(data, threads=4)
# Use thread pool based on nproc (`lzma_cputhreads`)
lzma.compress(data, threads=0)
# Support in `LZMAFile` class
with LZMAFile(BytesIO(data), threads=4) as f:
    pass
# Throws ValueError for negative `threads`
lzma.compress(data, threads=-1)

Notes:

  • This will extend the current API in lzma.py, by adding threads=1
  • threads is a suggestion to the underlying liblzma invocations, and is a hardcap for the number of background threads to use. Some presets might still use less threads (the more "aggressive" compression presets).
  • Using more background threads usually will cause deterioration in compression ratio, but will yield better performance time-wise.

Has this already been discussed elsewhere?

No response given

Links to previous discussion of this feature:

https://discuss.python.org/t/multi-threaded-lzma/26708/3

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    extension-modulesC modules in the Modules dirperformancePerformance or resource usagetype-featureA feature request or enhancement

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions