-
Notifications
You must be signed in to change notification settings - Fork 4
Description
The SQLite WAL file can store multiple versions of the same frame, when reading only valid frames should be returned. The docs define a valid frame as follows:
A frame is considered valid if and only if the following conditions are true:
- The salt-1 and salt-2 values in the frame-header match salt values in the wal-header
- The checksum values in the final 8 bytes of the frame-header exactly match the checksum computed consecutively on the first 24 bytes of the WAL header and the first 8 bytes and the content of all frames up to and including the current frame.
I believe the sentence for the second check is incomplete,
- The checksum values in the final 8 bytes of the frame-header exactly match the checksum computed consecutively on the first 24 bytes of the WAL header and the first 8 bytes and the content of all frames up to and including the current frame.
should be:
- The checksum values in the final 8 bytes of the frame-header exactly match the checksum computed consecutively on the first 24 bytes of the WAL header and the first 8 bytes of the frame header and the content of all frames up to and including the current frame.
This is based on the WAL Header Format and the WAL frame Header Format (docs):
WAL Header Format:
Offset Size Description 0 4 Magic number. 0x377f0682 or 0x377f0683 4 4 File format version. Currently 3007000. 8 4 Database page size. Example: 1024 12 4 Checkpoint sequence number 16 4 Salt-1: random integer incremented with each checkpoint 20 4 Salt-2: a different random number for each checkpoint 24 4 Checksum-1: First part of a checksum on the first 24 bytes of header 28 4 Checksum-2: Second part of the checksum on the first 24 bytes of header
WAL Frame Header Format:
Offset Size Description 0 4 Page number 4 4 For commit records, the size of the database file in pages after the commit. For all other records, zero. 8 4 Salt-1 copied from the WAL header 12 4 Salt-2 copied from the WAL header 16 4 Checksum-1: Cumulative checksum up through and including this page 20 4 Checksum-2: Second half of the cumulative checksum.
Meanwhile, the SQLite3 source has a slightly different definition of a valid frame SQLite3 source:
/*
** Check to see if the frame with header in aFrame[] and content
** in aData[] is valid. If it is a valid frame, fill *piPage and
** *pnTruncate and return true. Return if the frame is not valid.
*/
static int walDecodeFrame(
Wal *pWal, /* The write-ahead log */
u32 *piPage, /* OUT: Database page number for frame */
u32 *pnTruncate, /* OUT: New db size (or 0 if not commit) */
u8 *aData, /* Pointer to page data (for checksum) */
u8 *aFrame /* Frame data */
){
int nativeCksum; /* True for native byte-order checksums */
u32 *aCksum = pWal->hdr.aFrameCksum;
u32 pgno; /* Page number of the frame */
assert( WAL_FRAME_HDRSIZE==24 );
/* A frame is only valid if the salt values in the frame-header
** match the salt values in the wal-header.
*/
if( memcmp(&pWal->hdr.aSalt, &aFrame[8], 8)!=0 ){
return 0;
}
/* A frame is only valid if the page number is greater than zero.
*/
pgno = sqlite3Get4byte(&aFrame[0]);
if( pgno==0 ){
return 0;
}
/* A frame is only valid if a checksum of the WAL header,
** all prior frames, the first 16 bytes of this frame-header,
** and the frame-data matches the checksum in the last 8
** bytes of this frame-header.
*/
nativeCksum = (pWal->hdr.bigEndCksum==SQLITE_BIGENDIAN);
walChecksumBytes(nativeCksum, aFrame, 8, aCksum, aCksum);
walChecksumBytes(nativeCksum, aData, pWal->szPage, aCksum, aCksum);
if( aCksum[0]!=sqlite3Get4byte(&aFrame[16])
|| aCksum[1]!=sqlite3Get4byte(&aFrame[20])
){
/* Checksum failed. */
return 0;
}
/* If we reach this point, the frame is valid. Return the page number
** and the new database size.
*/
*piPage = pgno;
*pnTruncate = sqlite3Get4byte(&aFrame[4]);
return 1;
}Either way, computing a checksum of basically every frame, for every frame that needs to be read, seems to be rather expensive.
The valid property currently only implements the first check (source):
@property
def valid(self) -> bool:
salt1_match = self.header.salt1 == self.wal.header.salt1
salt2_match = self.header.salt2 == self.wal.header.salt2
return salt1_match and salt2_matchIn my limited testing the current implementation seems to be good enough, but for completeness sake I would like to at least have the option of full validation, what are your thoughts on this?.
An initial implementation of this can be found in: main...PimSanders:dissect.database:improvement/expand-wal-validation.