|
1 | 1 | --- |
2 | 2 | layout: post |
3 | | -title: "Tiny allocationless JSON parser in C" |
| 3 | +title: "Allocationless C89 JSON parser" |
4 | 4 | date: 2025-03-20 12:00:00 +0100 |
5 | | -redirect_from: /2025/03/20/tiny-allocationless-json-parser-in-c.html |
| 5 | +redirect_from: |
| 6 | + - /2025/03/20/tiny-allocationless-json-parser-in-c.html |
| 7 | + - https://mynameistrez.github.io/blog/tiny-allocationless-json-parser-in-c |
6 | 8 | --- |
7 | 9 |
|
8 | | -I wrote the library [Tiny allocationless JSON parser in C](https://github.com/MyNameIsTrez/tiny-allocationless-json-parser-in-c), which parses a subset of [JSON](https://en.wikipedia.org/wiki/JSON) in 533 lines of C. Only arrays, objects and strings are handled. |
9 | | - |
10 | | -I wrote this JSON parser for my tiny programming language called [grug](https://mynameistrez.github.io/2024/02/29/creating-the-perfect-modding-language.html). |
11 | | - |
12 | | -I was inspired by null program's [Minimalist C Libraries](https://nullprogram.com/blog/2018/06/10/) blog post, describing how C libraries never really need to allocate any memory themselves. The trick is to expect the user to pass `void *buffer` and `size_t buffer_capacity`: |
13 | | - |
14 | | -```c |
15 | | -int main() { |
16 | | - char buffer[420]; |
17 | | - |
18 | | - // If json_init() fails, just increase the starting size |
19 | | - assert(!json_init(buffer, sizeof(buffer))); |
20 | | - |
21 | | - struct json_node node; |
22 | | - |
23 | | - enum json_status status = json("foo.json", &node, buffer, sizeof(buffer)); |
24 | | - if (status) { |
25 | | - // Handle error here |
26 | | - exit(EXIT_FAILURE); |
27 | | - } |
28 | | - |
29 | | - // You can now recursively walk the JSON data in the node variable here |
30 | | -} |
31 | | -``` |
32 | | - |
33 | | -Instead of using a fixed size buffer, you can use `realloc()` to keep retrying the call with a bigger buffer: |
34 | | - |
35 | | -```c |
36 | | -int main() { |
37 | | - size_t size = 420; |
38 | | - void *buffer = malloc(size); |
39 | | - |
40 | | - // If json_init() fails, just increase the starting size |
41 | | - assert(!json_init(buffer, size)); |
42 | | - |
43 | | - struct json_node node; |
44 | | - |
45 | | - enum json_status status; |
46 | | - do { |
47 | | - status = json("foo.json", &node, buffer, size); |
48 | | - if (status == JSON_OUT_OF_MEMORY) { |
49 | | - size *= 2; |
50 | | - buffer = realloc(buffer, size); |
51 | | - } |
52 | | - } while (status == JSON_OUT_OF_MEMORY); |
53 | | - |
54 | | - if (status) { |
55 | | - // Handle error here |
56 | | - exit(EXIT_FAILURE); |
57 | | - } |
58 | | - |
59 | | - // You can now recursively walk the JSON data in the node variable here |
60 | | -} |
61 | | -``` |
62 | | - |
63 | | -## How it works |
64 | | - |
65 | | -The `json_init()` function puts an internal struct at the start of the buffer [here](https://github.com/MyNameIsTrez/tiny-allocationless-json-parser-in-c/blob/7d5bb76d11aa32da22c39a186ed2f721959abf64/json.c#L539-L543). `json()` uses the remaining buffer bytes to allocate the arrays it needs for parsing [here](https://github.com/MyNameIsTrez/tiny-allocationless-json-parser-in-c/blob/7d5bb76d11aa32da22c39a186ed2f721959abf64/json.c#L465). |
66 | | - |
67 | | -If one of the internal arrays is too small, it'll double the array's capacity [here](https://github.com/MyNameIsTrez/tiny-allocationless-json-parser-in-c/blob/c02215b1239f9a9c2f832f817ea5e6bab7eb6a19/json.c#L99-L123). |
68 | | - |
69 | | -The parser uses an [array-based hash table](https://mynameistrez.github.io/2024/06/19/array-based-hash-table-in-c.html) to detect duplicate object keys, and `longjmp()` to [keep the clutter of error handling at bay](https://mynameistrez.github.io/2024/03/21/setjmp-plus-longjmp-equals-goto-but-awesome.html). |
70 | | - |
71 | | -The [JSON spec](https://www.json.org/json-en.html) specifies that the other value types are `number`, `true`, `false` and `null`, but they can all be stored as strings. You could easily support these however by adding just a few dozen lines to `json.c`, and a handful of tests, so feel free to. This JSON parser also does not allow the `\` character to escape the `"` character in strings. |
72 | | - |
73 | | -## Simpler version: restart on reallocation |
74 | | - |
75 | | -If you don't mind the first JSON file taking a bit longer to be parsed, you can use the branch called [restart-on-reallocation](https://github.com/MyNameIsTrez/tiny-allocationless-json-parser-in-c/tree/restart-on-reallocation). It is 473 lines of code. |
76 | | - |
77 | | -If one of the internal arrays is too small, it'll automatically restart the parsing, where the array's capacity is doubled [here](https://github.com/MyNameIsTrez/tiny-allocationless-json-parser-in-c/blob/1e5dd1ae77e3f247f28026cc10abedd876aa43f0/json.c#L375-L376). So the first parsed JSON file will take a few iterations to be parsed successfully, while the JSON files after that will usually just take a single iteration. |
78 | | - |
79 | | -## Even simpler version: structless |
80 | | - |
81 | | -If you don't need to have several JSON files open at the same, so if you don't mind the code being stateful, you can use the branch called [structless](https://github.com/MyNameIsTrez/tiny-allocationless-json-parser-in-c/tree/structless): |
82 | | - |
83 | | -```c |
84 | | -int main() { |
85 | | - char buffer[420]; |
86 | | - |
87 | | - json_init(); |
88 | | - |
89 | | - struct json_node node; |
90 | | - |
91 | | - enum json_status status = json("foo.json", &node, buffer, sizeof(buffer)); |
92 | | - if (status) { |
93 | | - // Handle error here |
94 | | - exit(EXIT_FAILURE); |
95 | | - } |
96 | | - |
97 | | - // You can now recursively walk the JSON data in the node variable here |
98 | | -} |
99 | | -``` |
100 | | - |
101 | | -Its `json_init()` can't fail, and it is 461 lines of code. |
102 | | - |
103 | | -## Simplest version: static arrays |
104 | | - |
105 | | -Originally `json.c` was 397 lines of code, which you can still view in the branch called [static-arrays](https://github.com/MyNameIsTrez/tiny-allocationless-json-parser-in-c/tree/static-arrays): |
106 | | - |
107 | | -```c |
108 | | -int main() { |
109 | | - struct json_node node; |
110 | | - if (json("foo.json", &node)) { |
111 | | - // Handle error here |
112 | | - exit(EXIT_FAILURE); |
113 | | - } |
114 | | - |
115 | | - // You can now recursively walk the JSON data in the node variable here |
116 | | -} |
117 | | -``` |
118 | | - |
119 | | -It used static arrays with hardcoded sizes, which I described the advantages of in my blog post titled [Static arrays are the best vectors](https://mynameistrez.github.io/2024/04/09/static-arrays-are-the-best-vectors.html). |
120 | | - |
121 | | -There were two problems with it: |
122 | | -1. It didn't give the user control over how the memory was allocated. So you'd have to manually edit `#define` statements in `json.c`, if you wanted to increase say the maximum number of tokens that a JSON file is allowed to contain. |
123 | | -2. Whenever `json_parse()` was called, its static arrays would be reset. This meant that calling the function a second time would overwrite the previous call's JSON result. This was fine if you didn't need to open more than one JSON file at a time, though. But even if you did, you could just manually copy around the arrays containing the JSON data. |
124 | | - |
125 | | -At the moment, [grug](https://mynameistrez.github.io/2024/02/29/creating-the-perfect-modding-language.html) uses this static arrays approach. |
126 | | - |
127 | | -## Final thoughts |
128 | | - |
129 | | -Most of the work went into adding tons of tests to ensure it has as close to 100% coverage as possible; `tests.c` is almost as large as `json.c`! The tests also act as documentation. I've fuzzed the program with libFuzzer and documented it in [the repository](https://github.com/MyNameIsTrez/tiny-json-parser-in-c)'s README. Enjoy! 🙂 |
| 10 | +I wrote the library [Allocationless C89 JSON parser](https://github.com/MyNameIsTrez/allocationless-c89-json-parser). See its readme. Enjoy! 🙂 |
0 commit comments