Add transaction_details table#279
Conversation
All save and save_many transactions will fail if we attempt to write to the new table before it exists. This commit prevents writing to `transaction_defails` by default. This can, and likely should, be overridden in the `infogami.yml` configuration file.
There was a problem hiding this comment.
I believe that this file is only executed when the install action runs.
| LANGUAGE SQL; | ||
|
|
||
| create table transaction_details ( | ||
| id serial primary key, |
There was a problem hiding this comment.
| id serial primary key, | |
| id bigserial primary key, |
Walkthrough:
|
| Condition | Action |
|---|---|
Key in inserts only |
create — property didn't exist before |
Key in deletes only |
delete — property was removed |
| Key in both | update — property existed but its value changed |
It then bulk-inserts those rows into transaction_details.
Useful Query
SELECT
tx.comment,
tx.created AS tx_time,
t.key AS thing_edited,
tt.key AS thing_type,
p.name AS property_name,
td.property_action AS action,
au.key AS editor,
td.is_bot
FROM transaction_details td
JOIN transaction tx ON tx.id = td.transaction_id
JOIN thing t ON t.id = td.thing_id
JOIN thing tt ON tt.id = t.type
JOIN property p ON p.id = td.key_id
JOIN thing au ON au.id = td.author_id
ORDER BY td.id DESC
LIMIT 50;Note: rows with key_id = NULL (possible during type-change saves) will be silently excluded by this inner join. Use a LEFT JOIN on property if you want to see them.
What We Verified (28/29 Tests Pass)
| Scenario | Result |
|---|---|
| Edit a single string field (title, name, birth_date…) | Exactly one row, correct property name |
| Edit multiple fields in one save | One row per changed property, nothing extra |
| Add a value to a list (subjects, publishers…) | update |
| Remove all values from a list field | delete |
| Add a field that didn't previously exist | create |
| Remove a field entirely | delete |
| No-op save (same data re-saved) | Zero rows |
Empty string "" on a field |
Treated as removal → delete |
description saved as {"type": "/type/text", …} |
Zero rows — not indexed at all |
description saved as a plain string |
create row, value lands in datum_str |
works.authors change on a work |
Row for authors.author (the ref property) |
| Multi-doc save (work + author in one call) | Rows scoped correctly to each thing |
create → update → delete lifecycle |
All three actions classified correctly |
Known Bug (not yet fixed in PR)
Anonymous edits (author=None) crash with KeyError: None inside _add_transaction_details:
author_id = self.thing_ids[author_key] # KeyError when author_key is NoneFix: author_id = author_key and self.thing_ids.get(author_key)
DDL: author_id is already nullable, so no schema change needed.
How to Run This Locally
1. Worktree setup
cd ~/Projects/openlibrary
git fetch origin master
git worktree add ~/Projects/openlibrary-tx-details -b 11955/feature/transaction-details origin/master
cd ~/Projects/openlibrary-tx-details
git submodule update --init vendor/infogami
git -C vendor/infogami fetch origin txn_details
git -C vendor/infogami checkout txn_details2. Enable the feature flag
In conf/infobase.yml, add:
use_transaction_details_table: true3. Start Docker with local infogami
The compose.infogami-local.yaml overlay bind-mounts ./vendor/infogami over the named ol-vendor volume so the PR branch is live inside all containers:
OL_MOUNT_DIR="$(pwd)" docker compose \
-f compose.yaml \
-f compose.override.yaml \
-f compose.infogami-local.yaml \
up -d4. Create the table
docker compose exec db psql -U openlibrary -c "
CREATE TABLE transaction_details (
id serial primary key,
transaction_id integer references transaction(id),
thing_id integer references thing(id),
key_id integer references property(id),
property_action text,
author_id integer references thing(id),
is_bot boolean,
created timestamp without time zone default (current_timestamp at time zone 'utc')
);"5. Run the test suite
docker compose exec infobase python3 /openlibrary/scripts/test_transaction_details_comprehensive.pyThis runs 29 tests covering works, authors, and editions across create/update/delete scenarios, non-indexed fields, multi-doc saves, and edge cases.
Things Worth Raising in Review
- Anonymous edit bug —
author=NoneraisesKeyError. Needs a guard in_add_transaction_details. /type/textfields are invisible —description,notes, and other rich-text fields don't appear intransaction_details. This is consistent with how infogami indexes work, but worth an explicit callout since participation scoring may want to count these edits.- No indexes on
transaction_details— at scale, queries filtering bytransaction_id,thing_id, orauthor_idwill need indexes. Worth adding at least(transaction_id)and(author_id, created)before this goes to production. key_idcan be NULL on type-change saves (when infogami deletes all properties withname=None). Those rows are silently dropped from inner-join queries.
| deletes, inserts = self._update_index(records) | ||
|
|
||
| if config.use_transaction_details_table: | ||
| self._add_transaction_details(deletes, inserts, tx_id, author, bot) |
There was a problem hiding this comment.
Reverting these lines should fix things if things go horribly wrong.
Adds DDL, persistence logic, and feature flag for a new
transaction_detailstable.When the
use_transaction_details_tableconfiguration is set toTrue, detailed information about eachsaveandsave_manytransaction are written to the new table.The persistence logic leverages the
IndexUtil, which diffs aThingwith it's previous version in order to write to thepropertyanddatum_*(and other index tables likeedition_*,work_*, etc.) tables.IndexUtil.update_indexwas modified to return the collection ofdeletesandinsertsused to update the index tables.This data, along with the
transaction_id, author key, and author's bot status is passed toSaveImpl._add_transaction_detailsfor persistence in the new table.