This tool implements a decompressor for AT3P (mode N), AT4P, AT5P, and AT6P-compressed files. These file formats are apparently used in Chunsoft games for the DS, including 999: Nine Hours, Nine Persons, Nine Doors.
The only thing AT6P shares with its predecessors is a kind of similar header format:
| Offset (hex) | Size | Purpose |
|---|---|---|
| 0 | 4 | Magic (41 54 36 50) |
| 4 | 1 | Not used (seems to always be a multiple of 8...?) |
| 5 | 2 | Size of the compressed file, stored little-endian |
| 7 | 9 | Not used (often 00s, but not always??) |
| 10 | 3 | Size of the decompressed file, stored little-endian |
| 13 | 1 | Not used (often 00) |
| 14 | 1 | First byte of data |
| 15 | 1 | Not used (often 00, but not always??) |
| 16 | ... | Compressed data |
The compressed data is a stream of bits, intended to be read in sequence from least significant to most significant in each byte. AT6P has two strategies for using these bits to describe the value of the next byte:
- Encode the difference between this byte and the next byte using a relatively small number of bits
- Recognize that the next byte is the same as the last byte that had a different value
A backwards variant of exponential-Golomb coding is used to store commands. (Exponential-Golomb coding is similar to Elias gamma coding, if that name is more familiar to you.) Numbers closest to 0 use very few bits, while numbers further away from 0 use quadratically more bits as they increase in magnitude. The decoding process works like this:
- Start with a
countof 0. - Read a bit.
- If it is a 0, increase
countby 1, and then go back and read another bit. - If it is a 1, continue on to the next step
- If it is a 0, increase
- Excluding the 1 that you just read, read
countmore bits, and store them in a number variable calleddata. The first bit should become the least significant bit ofdata; the second bit, the next bit greater; etc. in little endian bit order. - Add (2^
count) - 1 todata. (x^y means "x raised to the power of y.") datais the decoded version of the bits.
For simplicity for me, the following table will list bits in their "numerical" order -- least significant bit on the right, most significant on the left.
| Bit pattern | Exponential-Golomb coding value | Command |
|---|---|---|
| 1 | 0 | Repeat the current byte (special meaning; normally would decode to +0) |
| 010 | 1 | Use previous byte's value (special meaning; normally would decode to -0) |
| 110 | 2 | Add to this byte: +1 |
| 00100 | 3 | Add to this byte: -1 |
| 01100 | 4 | Add to this byte: +2 |
| 10100 | 5 | Add to this byte: -2 |
| 11100 | 6 | Add to this byte: +3 |
| 0001000 | 7 | Add to this byte: -3 |
| ... | ... | ... |
| 111111110000000 | 254 | Add to this byte: +127 |
| 00000000100000000 | 255 | Add to this byte: -127 |
| 00000001100000000 | 256 | Add to this byte: +128 |
999 only supports bit patterns with 0-8 trailing zeroes (i.e., it is a rule that count <= 8);
any more zeroes and the game will crash.
As the table shows, the value you get from decoding the bits can be interpreted as a sign-magnitude
value. The least significant bit is the sign, and the rest of the bits make up the value. In
practice, this means you can use the following procedure to transform the second column (data)
into the third column:
- Store the value "(
datamod 2) * -2 + 1" into a number variable calledsign. (mod is the mathematical "modulo" operation, which will return either 0 or 1 in this case.) - Store the value "
data/ 2" into a number variable calledmagnitude. (/refers to integer division -- the result of division after rounding down to the nearest integer, or truncating towards 0. We're not dealing with negative decoded numbers so it's all the same.) - If
magnitudeis 0 andsignis -1, the command is a "use the value of the previous byte" command. - Otherwise, the command is to add the value "
sign * magnitude" to the current byte.
Regarding the values of the "previous" and "current" byte, they're kept track of as follows:
- When using the command with sign = 1 and magnitude = 0, the previous byte and current byte are not updated.
- When using all other commands, including the "use previous byte's value" command, the "current byte" for this step becomes the previous byte for the next step, and the byte outputted by this step becomes the "current byte" for the next step.
These procedures are not necessarily the most efficient ways to read the data, but they should yield the same results as the in-game decompression code in 999.
There are some other interesting quirks to note about 999's specific implementation of the decompression algorithm:
- The game does not check that the minimal representation of a delta is actually used, for most
standard commands. Only a delta of +0 ("repeat the current byte") or -0 ("use previous byte's
value") must be represented minimally; otherwise, they'll both be treated as literally outputting
a byte with the current byte's value plus (or minus) 0, making the current and previous bytes'
values the same.
- This fact seems pretty useless for compression. It takes up more space to encode a +0/-0 delta that doesn't preserve the previous byte value, and it means that you'll have to encode the previous byte value less efficiently later when it shows up. In return, you get a second way to say "repeat the current byte" that also takes up more bits than necessary!
- If you use this fact to generate compressed files that cannot be correctly decompressed by naïvely-implemented AT6P decompression tools, as a rudimentary copy protection mechanism for your mod, I'll be really disappointed.
- If you use this fact to try and hack compressed files and minimize the diff of the compressed file... well. I can't imagine that situation ever happening. Just decompress the file first and then diff it and recompress. You're not trying to hack on an ancient console with no filesystem, unlike the NSO devs editing compressed files directly in Ocarina of Time.
- Passing a compression stream to the in-game decompression function whose length in bytes is odd
will cause undefined behavior to occur when reading the last few bits in the file, because the
game reads data from the compression stream 16 bits at a time. This undefined behavior is almost
certainly benign in all cases, but it's easy to prevent by just padding the file with an extra
00byte manually if its length is odd. There should be no real increase in file size or memory usage after everything's been inserted back into the DS ROM and copied onto the heap.
Compression should be relatively simple, you just need to reverse the decompression process. There's
very little opportunity for creativity in this process. Wikipedia's page on Exponential-Golomb
coding reveals a fairly simple method for encoding a non-negative number d into 999's format:
- Let
n=d+ 1 - Let
shift= floor(logbase(2, n)) - Let
power= 2^shift-- that is, the greatest power of 2 less than or equal ton. - Output
powerbit by bit, beginning from the least significant bit, for a total of (shift+ 1) bits. This will output the appropriate number of 0s and a 1. - Output
nmodpower-- that is,nwithout its most significant bit -- bit by bit, beginning from the least significant bit, for a total ofshiftbits.
The full algorithm to generate the compression stream might look like this, if you have a "write 1 bit" primitive outputting bits from least significant to most significant:
- Set
currentandpreviousto the first byte of the uncompressed file. - For each index
iand bytebafter the first byte, in order from beginning to end of the file:- Let
deltabeb-current. - Let
delta_signedbe(delta + 128) mod 256 - 128. - Let
sign_bitbe 1 ifdelta_signed< 0; otherwise, let it be 0. - Let
magnitudebe |delta_signed|. - If
magnitude= 0, setwordto 1. - Otherwise, if
b=previous, setword' to 2. - Otherwise, set
wordtomagnitude× 2 +sign_bit+ 1. - If
word≠ 1, setprevioustob. - Let
shift= floor(logbase(2,word)). - Write the bits for
poweras described in the bullet point list above:- Repeat
shift- 1 times:- Write the bit
0.
- Write the bit
- Write the bit
1.
- Repeat
- Write the bits for
wordmodpoweras described in the bullet point list above:- Repeat while
word≠ 1:- Write the bit
wordmod 2. - Set
wordtoworddiv 2.
- Write the bit
- Repeat while
- Let
- Repeat while the length (in bits) of the output is not a multiple of 16 (or 8, if benign undefined
behavior is acceptable):
- Write the bit
0.
- Write the bit
This compression should give the optimal result, though it could probably stand to be better optimized with a "write N bits" primitive. The existing compression code I have uses a "write 1 bit" primitive, though.
Matching the original compressed files is made more difficult by the inclusion of purposeless bytes in the header whose values are not properly zeroed out. I'm not aware of how they could be generated or calculated, if there is a reasonable way to calculate them at all. Deeper statistical analysis (or an interview with the developers who wrote Chunsoft's compression tooling) is probably necessary to say anything concrete about their values, since nothing uses those values.
999 doesn't only feature AT6P-compressed files; it also has several AT5P-compressed files, like the 3D models for items. The AT5P format is very similar to the older AT3P and AT4P formats, which appeared in older and more popular Chunsoft games like Pokémon Mystery Dungeon: Explorers of Sky. With popularity comes pre-existing research and a greater likelihood of tools that can work with those formats. And there are indeed several repositories with so-called "PX" decompression code (due to the 5th byte in the file nearly always being "X" for these types of files, for some reason):
- pxutil / ppmdu_2 by PsyCommando (CC0 Dedication)
- SkyTemple by various authors (GPLv3 License)
For completeness, that family of compression formats will also be described here, as it is implemented in 999.
All of these formats are LZSS-based formats where certain back-reference lengths, which depend on the file, are set aside to be given a special meaning different from "copy (this number of) bytes." These tradeoffs seem fairly well suited to compressing 4bpp graphics.
The header's format changes depending on the exact state of the first 5 bytes of the header.
AT3P files are distinguished by their file magic of AT3P (41 54 33 50). This is the simplest
format and it does not include the size of the decompressed file anywhere in its header.
| Offset (hex) | Size | Purpose |
|---|---|---|
| 0 | 4 | Magic (41 54 33 50) |
| 4 | 1 | Mode (see mode "N" below) |
| 5 | 2 | Size of the compressed file, stored little-endian |
| 7 | 9 | List of the 9 LZ length nybbles that have special meaning in this file |
| 10 | ... | The compressed data stream |
All the other file headers build off of this one.
AT4P files add in a 16-bit size that the file should have once it's been decompressed.
| Offset (hex) | Size | Purpose |
|---|---|---|
| 0 | 4 | Magic (41 54 34 50) |
| 4 | 1 | Mode (see mode "N" below) |
| 5 | 2 | Size of the compressed file, stored little-endian |
| 7 | 9 | List of the 9 LZ length nybbles that have special meaning in this file |
| 10 | 2 | Size of the decompressed data, stored little-endian |
| 12 | ... | The compressed data stream |
The size after decompression is used in two ways:
- If the game calls the decompression function with a non-zero buffer size, the game verifies that this buffer size matches the decompressed data size in the file exactly. If it does not match, the function reports that it decompressed 0 bytes of data and exit early.
- The game checks before processing every new compression token whether it has already reached the size of the destination buffer, if the game specified a non-zero buffer size. If it has reached or exceeded the destination buffer size, the function reports that it decompresed 0 bytes of data and exits early. (Note that the decompression function does not do full bounds checking. As long as the buffer size is at least big enough to hold the first byte of the last token, the game will fully decompress the data, reporting the actual number of bytes that it wrote to the buffer.
AT5P files expand the sizes of the compressed and decompressed file to 24-bit numbers by adding two additional bytes to the header, compared to AT4P.
| Offset (hex) | Size | Purpose |
|---|---|---|
| 0 | 4 | Magic (41 54 35 50) |
| 4 | 1 | Mode (see mode "N" below) |
| 5 | 2 | Least significant 16 bits of the compressed file's size, stored little-endian |
| 7 | 9 | List of the 9 LZ length nybbles that have special meaning in this file |
| 10 | 3 | 24-bit size of the decompressed data, stored little-endian |
| 13 | 1 | Most significant 8 bits of the compressed file's size |
| 14 | ... | The compressed data stream |
Everything else is exactly the same as with AT4P.
When the first byte after the 4-byte file magic is N (0x4E), it means the file has not been
compressed at all. In this case, the reported "size of the compressed file" is instead treated as a
size of the uncompressed data, and that many bytes beginning at offset 7 in the header are copied
into the destination buffer (without bounds checking).
While this mode "N" is checked for all three of the file headers, it would be difficult to use properly with AT4P files (since the file contents overlap with the "size after decompression" field checked earlier in the process) and even more difficult to use properly with AT5P files (since the file contents also overlap with the most significant byte of the "size of the compressed file" -- that is, the size of the uncompressed data at offset 7). So it only really makes sense to use this "N" mode with AT3P unless you happen to be able to guarantee that your source data file has the decompressed and compressed size's relevant bytes at the right positions.
Generally, if the mode of any of these compression formats is not intended to be N, it will be
explicitly filled in with X instead. Any non-N value will work the same as X, however.
As mentioned earlier, the compressed data takes a format very similar to your average implementation of LZSS. The compression stream consists of interspersed "literal byte" tokens and "back-reference" tokens. As in most implementations of LZSS, these are distinguished by adding a "flag byte" at the beginning of every group of 8 tokens to minimize overhead. It uses a 12-bit sliding window size (allowing access to up to 4096 previously-outputted bytes, with cycles allowed) and a 4-bit length value for these back-references.
The innovative feature of the compression is the list of 9 bytes in the header. Each byte is intended to have a unique integer value on the closed interval [0, 15]. Dictionary back-references in the compressed stream dedicate 4 bits to the length, meaning each byte can be a valid length for such a back reference (with a bias of -3 added to represent the smallest length as 0). These 9 lengths are treated as special commands with a 4-bit (not 12-bit) argument that will output various two-byte patterns with 3 or more of the same nybble.
Full decoding code after interpreting the header can be implemented as follows:
- Let
mask= 0. - Repeat until all bytes in the compressed file have been read:
- Try to read the next compression token, if there's space in the destination buffer for more
bytes.
- If
mask= 0, read a byte from the compressed file. Letflagsbe this byte. Setmaskequal to 128. - Let
bitbe "flagsdivmask." Setflags's value to "flagsmodmask."
- If
- If
bit= 1, the next token is a "literal byte" token. Read one byte from the compressed file and append it to the decompression buffer. - Otherwise,
bit= 0. The next token is a "back-reference" token. Read one byte from the compressed file. Letlengthbe "(the byte read) div 16" and letxbe "(the byte read) mod 16."- Let
ibe the first index wherelengthcan be found in the list of special nybbles, or -1 iflengthis not present. - If
i= -1, this token indicates a back-reference. Read an additional byte from the compressed file. Letybe this byte.- Append
length + 3bytes, beginning from4096 - (x * 256 + y)bytes ago in the decompression buffer, to the end of the buffer. Copy the bytes 1 at a time, so as to allow newly appended bytes to be re-appended to the end of the buffer again iflengthandx * 256 + yare sufficiently large.
- Append
- Otherwise,
iindicates a special command to output 2 bytes composed of two nybbles. Lethbe(x + 1) mod 16and letlbe(x - 1) mod 16.- If
i= 0, append the bytes "x* 16 +x" and "x* 16 +x" to the buffer. - If
i= 1, append the bytes "x* 16 +h" and "h* 16 +h" to the buffer, in that order. - If
i= 2, append the bytes "x* 16 +l" and "x* 16 +x" to the buffer, in that order. - If
i= 3, append the bytes "x* 16 +x" and "l* 16 +x" to the buffer, in that order. - If
i= 4, append the bytes "x* 16 +x" and "x* 16 +l" to the buffer, in that order. - If
i= 5, append the bytes "x* 16 +l" and "l* 16 +l" to the buffer, in that order. - If
i= 6, append the bytes "x* 16 +h" and "x* 16 +x" to the buffer, in that order. - If
i= 7, append the bytes "x* 16 +x" and "h* 16 +x" to the buffer, in that order. - Otherwise,
imust be 8. Append the bytes "x* 16 +x" and "x* 16 +h" to the buffer, in that order.
- If
- Let
- Set
mask's value tomask div 2.
- Try to read the next compression token, if there's space in the destination buffer for more
bytes.
Some things to note about the list of special nybbles:
- Even if duplicate values are found in the list, only the first one counts. (I'm not aware of any files created by Chunsoft that have duplicate values in the list of nybbles.)
- If the value of an entry in this list of "nybbles" is outside of the range [0, 15], it effectively disables that command, giving you freedom to use that length for dictionary references. (I'm not aware of any files created by Chunsoft that use out-of-range values in AT5P files.)
- There is no guarantee that the list of nybbles is in any sorted order. (Every file in 999 that I've checked has had the nybbles not sorted in numerical order.)
- Use of these can reduce two literal-byte tokens down to one one-byte token, at the cost of turning certain dictionary back-references into multiple back references in a row.
One approach for compression is documented on the Project Pokémon website. I didn't really internalize everything on that page, but I did read it at least once. My ideas may have been influenced a bit by it.
I haven't written a compressor for this format yet, so take everything after this point as a brain dump and treat it with appropriate skepticism/scrutiny.
The way I would approach this is to try compressing the data normally down to a sequence of LZSS tokens and then do a "peephole optimization" pass where qualifying sequences of two literal bytes are replaced with commands, adjusting the lengths of back references to valid unclaimed lengths at the end. But even within this vague initial framework, there are many more potential avenues for creativity in implementing the compression:
- Is the LZSS compression algorithm used greedy, or can it look ahead for better back-references?
- Are special commands inserted based on which appears first in the compressed file, or based on which occurs most often in the file?
- Relatedly, when multiple commands could apply to overlapping areas, which one wins?
- How do you ensure that back reference lengths can be easily represented once a command makes one
length unusable?
- It is most difficult to compensate for the loss of the smallest lengths (nybbles 0, 1, and 2
-- that is, lengths 3, 4, and 5). All the others can be emulated using multiple dictionary
back-references in a row. Assuming there is a peephole optimization pass that can't see inside
of back-references and pull literal bytes out of them, that means
00,01, and02must never show up in the nybble list, leaving four other nybbles to be kept out of the list. - It is possible to choose the back-reference lengths such that each back reference expands to
at most two back-reference tokens. You don't even need to use all seven lengths:
- Assume 3 is a valid length
- Assume 4 is a valid length
- Assume 5 is a valid length
- 6 = 3 + 3
- 7 = 3 + 4
- 8 = 3 + 5
- 9 = 4 + 5
- 10 = 5 + 5
- Assume 11 is a valid length
- Assume 12 is a valid length
- Assume 13 is a valid length
- 14 = 11 + 3
- 15 = 12 + 3
- 16 = 13 + 3
- 17 = 13 + 4
- 18 = 13 + 5
- This could reduce the problem to choosing one back reference length that occurs most often
and benefits the most from being turned into a single back-reference length, and hardcoding
all the other disallowed nybble values (
00,01,02,08,09, and0A). But in practice, it's probably better compression-wise to let the most common back-references be represented by 3 tokens to reduce other very common tokens to 1 token, meaning you're back to 4 lengths to choose. - Is it possible to get optimal compression using a peephole optimization pass with an approach like this?
- It is most difficult to compensate for the loss of the smallest lengths (nybbles 0, 1, and 2
-- that is, lengths 3, 4, and 5). All the others can be emulated using multiple dictionary
back-references in a row. Assuming there is a peephole optimization pass that can't see inside
of back-references and pull literal bytes out of them, that means
- Is it realistic to brute-force the compression by trying all possible parameters? (Ignoring the
order they're assigned to the various commands)
- If we have 16 options for every nybble and up to 9 nybbles, the number of combinations is 16 + 120 + 560 + 1820 + 4368 + 8008 + 11440 + 12870 + 11440 = 50642.
- If we have 16 options for every nybble and exactly 9 nybbles, the number of combinations is only 11440.
- If we have 13 options for every nybble and up to 9 nybbles, that gives 715 + 1287 + 1716 + 1716 + 1287 + 715 + 286 + 78 + 13 = 7813 combinations.
- If we have 13 options for every nybble and exactly 9 nybbles, that gives only 715 combinations.
- So while the full state space may be a bit intensive to brute-force search, some initial assumptions about which back-reference lengths will be needed can substantially reduce the number of considered possibilities.
Feel free to use the descriptions in this README to implement decompression and compression. I tried to keep them relatively detached from the actual implementation in-game where possible. Consider it to be available to you with a CC0 1.0 Universal public domain dedication.
The C code currently provided was derived from trying to provide a faithful re-creation of what the assembly code was doing, translating idiomatically. It is probably not a "matching decompilation" that can generate the same assembly code when the same compiler is used. No part of the output was copied from any automated decompilation tool (frankly, that would have made it harder). On the other hand, I've made a best effort to give variables good names and document non-obvious bits of control flow. I hope that the relatively small amount of code provided (two functions), and my creative input to choosing variable and function names and comments, makes the code in this repository qualify as "fair use" of Chunsoft's assembly code under US law, and copyrightable as well.
Thus, to the extent permissible by law, any software in this repository is licensed under the MIT license.