Improve blosc efficiency #5

eschnett · 2023-11-16T16:48:41Z

This is a continuation of braingram#1. This PR discusses two improvements to the current interface to blosc:

The compressor does not pass the data type size. Knowing the data type size allows the shuffle filter to reorder the data, exposing more regularity, which allows the compression algorithm to compress better.
The decompressor has the ability to write into a preallocated buffer instead of allocating its own output buffer. This saves memory bandwidth and would improve the decompression speed slightly.

I experimented with creating a large 3d float64 array (1000 x 1000 x 250 elements) and compressing it with the shuffle filter, using as type sizes either 8 (describing the data) or 1:

  for (int64_t i = 0; i < ni; ++i)
    for (int64_t j = 0; j < nj; ++j)
      for (int64_t k = 0; k < nk; ++k) {
        int64_t idx = getidx(i, j, k);
        rho.at(idx) = 1.0 / (1.1 * i + 1.2 * j + 1.3 * k + 1);
      }

. The resulting file sizes are:

  -rw-r--r--   1 eschnett staff 1993847361 Nov 16 11:31 large-new-shuffle-typesize-1.asdf
  -rw-r--r--   1 eschnett staff  395927299 Nov 16 11:29 large-new-shuffle-typesize-8.asdf

In this case the efficiency drops by a factor of 5 when using the wrong type size.

Improve blosc efficiency

647d923

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve blosc efficiency #5

Improve blosc efficiency #5

Uh oh!

eschnett commented Nov 16, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Improve blosc efficiency #5

Are you sure you want to change the base?

Improve blosc efficiency #5

Uh oh!

Conversation

eschnett commented Nov 16, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant