Skip to content

Expand SQLite3 data validation when reading from WAL #16

@PimSanders

Description

@PimSanders

The SQLite WAL file can store multiple versions of the same frame, when reading only valid frames should be returned. The docs define a valid frame as follows:

A frame is considered valid if and only if the following conditions are true:

  1. The salt-1 and salt-2 values in the frame-header match salt values in the wal-header
  2. The checksum values in the final 8 bytes of the frame-header exactly match the checksum computed consecutively on the first 24 bytes of the WAL header and the first 8 bytes and the content of all frames up to and including the current frame.

I believe the sentence for the second check is incomplete,

  1. The checksum values in the final 8 bytes of the frame-header exactly match the checksum computed consecutively on the first 24 bytes of the WAL header and the first 8 bytes and the content of all frames up to and including the current frame.

should be:

  1. The checksum values in the final 8 bytes of the frame-header exactly match the checksum computed consecutively on the first 24 bytes of the WAL header and the first 8 bytes of the frame header and the content of all frames up to and including the current frame.

This is based on the WAL Header Format and the WAL frame Header Format (docs):

WAL Header Format:

Offset Size Description
0 4 Magic number. 0x377f0682 or 0x377f0683
4 4 File format version. Currently 3007000.
8 4 Database page size. Example: 1024
12 4 Checkpoint sequence number
16 4 Salt-1: random integer incremented with each checkpoint
20 4 Salt-2: a different random number for each checkpoint
24 4 Checksum-1: First part of a checksum on the first 24 bytes of header
28 4 Checksum-2: Second part of the checksum on the first 24 bytes of header

WAL Frame Header Format:

Offset Size Description
0 4 Page number
4 4 For commit records, the size of the database file in pages after the commit. For all other records, zero.
8 4 Salt-1 copied from the WAL header
12 4 Salt-2 copied from the WAL header
16 4 Checksum-1: Cumulative checksum up through and including this page
20 4 Checksum-2: Second half of the cumulative checksum.

Meanwhile, the SQLite3 source has a slightly different definition of a valid frame SQLite3 source:

/*
** Check to see if the frame with header in aFrame[] and content
** in aData[] is valid.  If it is a valid frame, fill *piPage and
** *pnTruncate and return true.  Return if the frame is not valid.
*/
static int walDecodeFrame(
  Wal *pWal,                      /* The write-ahead log */
  u32 *piPage,                    /* OUT: Database page number for frame */
  u32 *pnTruncate,                /* OUT: New db size (or 0 if not commit) */
  u8 *aData,                      /* Pointer to page data (for checksum) */
  u8 *aFrame                      /* Frame data */
){
  int nativeCksum;                /* True for native byte-order checksums */
  u32 *aCksum = pWal->hdr.aFrameCksum;
  u32 pgno;                       /* Page number of the frame */
  assert( WAL_FRAME_HDRSIZE==24 );

  /* A frame is only valid if the salt values in the frame-header
  ** match the salt values in the wal-header.
  */
  if( memcmp(&pWal->hdr.aSalt, &aFrame[8], 8)!=0 ){
    return 0;
  }

  /* A frame is only valid if the page number is greater than zero.
  */
  pgno = sqlite3Get4byte(&aFrame[0]);
  if( pgno==0 ){
    return 0;
  }

  /* A frame is only valid if a checksum of the WAL header,
  ** all prior frames, the first 16 bytes of this frame-header,
  ** and the frame-data matches the checksum in the last 8
  ** bytes of this frame-header.
  */
  nativeCksum = (pWal->hdr.bigEndCksum==SQLITE_BIGENDIAN);
  walChecksumBytes(nativeCksum, aFrame, 8, aCksum, aCksum);
  walChecksumBytes(nativeCksum, aData, pWal->szPage, aCksum, aCksum);
  if( aCksum[0]!=sqlite3Get4byte(&aFrame[16])
   || aCksum[1]!=sqlite3Get4byte(&aFrame[20])
  ){
    /* Checksum failed. */
    return 0;
  }

  /* If we reach this point, the frame is valid.  Return the page number
  ** and the new database size.
  */
  *piPage = pgno;
  *pnTruncate = sqlite3Get4byte(&aFrame[4]);
  return 1;
}

Either way, computing a checksum of basically every frame, for every frame that needs to be read, seems to be rather expensive.

The valid property currently only implements the first check (source):

@property
def valid(self) -> bool:
    salt1_match = self.header.salt1 == self.wal.header.salt1
    salt2_match = self.header.salt2 == self.wal.header.salt2

    return salt1_match and salt2_match

In my limited testing the current implementation seems to be good enough, but for completeness sake I would like to at least have the option of full validation, what are your thoughts on this?.

An initial implementation of this can be found in: main...PimSanders:dissect.database:improvement/expand-wal-validation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestsqlite3Related to SQLite3

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions