Skip to content

Denormalise change notes#3743

Draft
richardTowers wants to merge 3 commits intomainfrom
denormalise-change-notes
Draft

Denormalise change notes#3743
richardTowers wants to merge 3 commits intomainfrom
denormalise-change-notes

Conversation

@richardTowers
Copy link

This is an illustration of what it might look like to "denormalize" the change notes table to improve query performance.

The issue we have at the moment is that to get all of the change notes for a document, you have to get all the editions from the document, and then for each edition its change note. Some documents have in excess of 10,000 editions, so this means doing 10,000+ index scans on change notes. Alternatively, sometimes postgres will decide to just build a hash of every single change note (for all documents), and every edition on the document, and then join them. But that's also very slow.

At the cost of a bit of duplicate data storage, we can avoid looking at the editions table at all, which is massively faster.

The drawbacks of this approach are:

  • We're storing a little more data (a small amount though, basically irrelevant)
  • We're storing "which edition belongs to which document" in two places - the editions table, and the change notes table. Which feels like "bad" database design. In practice, editions never move from one document to another, and user_facing_version never changes once an edition is created. So I don't think there's any realistic data hazard in doing it this way.

This gets the graphql performance benchmarks pretty close to passing on my clunky old macbook, which is exciting. The remaining slow queries are link expansion ones, which I think we've optimised about as far as we can.

Without this the change history query is fairly regularly much too slow for the benchmarks (on my machine), which I think is foreshadowing that it would cause some problems in production (even though the database in prod is much more capable than my macbook).

Richard Towers added 3 commits December 18, 2025 16:13
I populated these columns locally just by doing:

    UPDATE change_notes cn
    SET user_facing_version = e.user_facing_version
    FROM editions e
    WHERE cn.edition_id = e.id;

and

    UPDATE change_notes cn
    SET document_id = e.document_id
    FROM editions e
    WHERE cn.edition_id = e.id;

... each of which took about 1 minute. We might want to do them in batches in production, just in case we lock the table and cause some interruption.
Now that we've denormalised the document_id and user_facing_version
columns from editions into the change_notes table, we can do this query
without having to do an expensive join.
@richardTowers richardTowers force-pushed the denormalise-change-notes branch from 3276f34 to f026668 Compare December 18, 2025 16:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant