Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 18 additions & 5 deletions parquet/src/compression.rs
Original file line number Diff line number Diff line change
Expand Up @@ -201,7 +201,7 @@ mod snappy_codec {
use snap::raw::{Decoder, Encoder, decompress_len, max_compress_len};

use crate::compression::Codec;
use crate::errors::Result;
use crate::errors::{ParquetError, Result};

/// Codec for Snappy compression format.
pub struct SnappyCodec {
Expand Down Expand Up @@ -231,10 +231,23 @@ mod snappy_codec {
None => decompress_len(input_buf)?,
};
let offset = output_buf.len();
output_buf.resize(offset + len, 0);
self.decoder
.decompress(input_buf, &mut output_buf[offset..])
.map_err(|e| e.into())
output_buf.reserve(len);
// SAFETY: we pass the spare capacity to snappy which will write exactly
// `len` bytes on success. The `set_len` below is only reached when
// decompression succeeds. `MaybeUninit<u8>` has the same layout as `u8`.
let spare = output_buf.spare_capacity_mut();
let spare_bytes = unsafe {
std::slice::from_raw_parts_mut(spare.as_mut_ptr().cast::<u8>(), spare.len())
};
let n = self
.decoder
.decompress(input_buf, &mut spare_bytes[..len])
.map_err(|e| -> ParquetError { e.into() })?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this returns on error before setting len, will the buffer be left in an inconsistent state?

I think the use of the mut slice ensures that the call to decompress won't overwrite the newly allocated bytes.

However, this also basically passes in uninitialized bytes to decompress -- how do we know that the decompress doesn't read them? Maybe we should add a SAFETY warning to the decompress API that says it can't rely on initialized bytes 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Effectively we rely on this:

https://docs.rs/snap/latest/snap/raw/struct.Decoder.html#errors

  • output has length less than decompress_len(input).

To not use unsafe we would need to have this feature:
BurntSushi/rust-snappy#62

// SAFETY: snappy wrote exactly `n` bytes into the spare capacity
unsafe {
output_buf.set_len(offset + n);
}
Ok(n)
}

fn compress(&mut self, input_buf: &[u8], output_buf: &mut Vec<u8>) -> Result<()> {
Expand Down
Loading