Denormalise change notes by richardTowers · Pull Request #3743 · alphagov/publishing-api

richardTowers · 2025-11-28T18:39:26Z

This is an illustration of what it might look like to "denormalize" the change notes table to improve query performance.

The issue we have at the moment is that to get all of the change notes for a document, you have to get all the editions from the document, and then for each edition its change note. Some documents have in excess of 10,000 editions, so this means doing 10,000+ index scans on change notes. Alternatively, sometimes postgres will decide to just build a hash of every single change note (for all documents), and every edition on the document, and then join them. But that's also very slow.

At the cost of a bit of duplicate data storage, we can avoid looking at the editions table at all, which is massively faster.

The drawbacks of this approach are:

We're storing a little more data (a small amount though, basically irrelevant)
We're storing "which edition belongs to which document" in two places - the editions table, and the change notes table. Which feels like "bad" database design. In practice, editions never move from one document to another, and user_facing_version never changes once an edition is created. So I don't think there's any realistic data hazard in doing it this way.

This gets the graphql performance benchmarks pretty close to passing on my clunky old macbook, which is exciting. The remaining slow queries are link expansion ones, which I think we've optimised about as far as we can.

Without this the change history query is fairly regularly much too slow for the benchmarks (on my machine), which I think is foreshadowing that it would cause some problems in production (even though the database in prod is much more capable than my macbook).

I populated these columns locally just by doing: UPDATE change_notes cn SET user_facing_version = e.user_facing_version FROM editions e WHERE cn.edition_id = e.id; and UPDATE change_notes cn SET document_id = e.document_id FROM editions e WHERE cn.edition_id = e.id; ... each of which took about 1 minute. We might want to do them in batches in production, just in case we lock the table and cause some interruption.

Now that we've denormalised the document_id and user_facing_version columns from editions into the change_notes table, we can do this query without having to do an expensive join.

Richard Towers added 3 commits December 18, 2025 16:13

Make edition and document mandatory on ChangeNote

44cebdb

Optimise change_notes_for_edition query

f026668

Now that we've denormalised the document_id and user_facing_version columns from editions into the change_notes table, we can do this query without having to do an expensive join.

richardTowers force-pushed the denormalise-change-notes branch from 3276f34 to f026668 Compare December 18, 2025 16:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Denormalise change notes#3743

Denormalise change notes#3743
richardTowers wants to merge 3 commits intomainfrom
denormalise-change-notes

richardTowers commented Nov 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

richardTowers commented Nov 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant