·8 min read·Deep Dive

The audit log is the debugging surface

An append-only event log is more useful than a mutable database for any system where 'what was true at decision time' matters more than 'what is true now.' How the pattern works, why it's underused, and where it pays off.

By Igor Riera

Most application databases mutate state in place: a user’s email gets updated, an order’s status moves from pending to paid, a configuration value gets overwritten with a new one. The current state of the world is the source of truth, logs are a side effect useful for debugging when something goes wrong but otherwise ignored.

This works fine until something does go wrong and you need to know what the state actually was at the moment a decision was made — not what it is now after three subsequent updates overwrote it. At that moment the side-effect log either has the answer or it doesn’t, and you discover which one by reading it. Most teams discover their logs don’t have the answer.

The alternative is to flip the relationship. The append-only event log becomes the source of truth. The current state is a projection of it. Nothing in the system updates in place; every state change is a new row.

This pattern is called event sourcing in some contexts, audit logging in others, and “the way we’ve always done it” in finance and regulated industries. It’s wildly underused outside those domains. Most application code that would benefit from it doesn’t have it. This post is a defense of the pattern and a guide to the cases where it pays off.

What the pattern actually looks like

Concretely, in the systems I build it looks like this:

CREATE TABLE audit_log (
    id            INTEGER PRIMARY KEY AUTOINCREMENT,
    timestamp     TEXT    NOT NULL,
    event_type    TEXT    NOT NULL,
    entity_type   TEXT    NOT NULL,
    entity_id     TEXT    NOT NULL,
    payload       TEXT    NOT NULL,  -- JSON
    causation_id  INTEGER REFERENCES audit_log(id)
);

Every state-changing event in the application writes one row. event_type is a stable identifier (order_placed, risk_check_passed, trade_announced, etc.). payload is a JSON document with everything the consumer needs to reconstruct the event — the inputs, the decision, the outputs. causation_id links each event to the event that caused it, so you can trace a chain from “user clicked a button” through every downstream consequence.

Read paths look at the log. The current state of an order, for example, is computed by replaying every order_* event for that order ID and reducing them down to a single state object. Reads are cached — usually in a separate “projection” table that’s rebuilt from the log on each event — so the live read path stays fast even when the log gets large.

Writes only go to the log. No update statements anywhere in the codebase that touch the projection tables directly. The projection is always derived; it is never authoritative.

Why this works

The audit log is the debugging surface. Current state is just a projection of it. Three properties make this pattern useful in ways that are hard to recover later if you didn’t start with it:

Time travel. You can compute “what was the state of X at timestamp T” by replaying the log up to that timestamp and stopping. This is invaluable when investigating an incident: you don’t need to remember what the system looked like an hour ago; you can reconstruct it. Most application databases can’t do this without restoring from backup.

Causation. Each event names the event that caused it. A trade fill names the order placement that produced it. The order placement names the risk-check approval that produced it. The risk-check approval names the signal generation that produced it. The signal generation names the candle-data ingestion that produced it. A full chain from sensor to side-effect is one query.

Immutability. Nothing in the log changes after it’s written. If a row is wrong, you don’t update it — you write a new row that supersedes it, with a clear event type that says so. This sounds pedantic until you need to answer “did this row used to have a different value?” — and the answer is “no, by construction.”

Where the pattern pays off

The pattern is overkill for plenty of systems. A blog comment table doesn’t need it. A user profile table doesn’t need it. Most CRUD applications, most of the time, are fine with mutable state.

The pattern pays off in systems where any of the following are true:

Decisions need to be auditable after the fact. Trading systems. Healthcare workflows. Insurance claims. Anywhere a regulator, a CPA, an investigator, or a future-you might ask “why did this decision get made on this date?” — and the answer needs to be more than “we don’t know, that data has since been updated.”

State changes are decisions, not data updates. A user’s name is data, an algorithmic trade is a decision, a risk-check approval is a decision, a patient-record state transition is a decision. Decisions have a why, and the why is worth capturing in the same place as the what.

Debugging requires reconstructing past state. Distributed systems, workflow engines, anything where a failure mode is “the system did X at 3 AM and I have no idea what state it was in when it decided to do X.” Replaying the log up to that timestamp tells you exactly.

The cost of a wrong answer is asymmetric. A wrong tweet costs nothing, but a wrong trade costs money, and a wrong patient-record update costs lives. Systems where the cost of being wrong is much higher than the cost of recording extra data benefit from recording extra data.

Concrete examples from systems I’ve built

Cerberus Markets, the Kraken-backed crypto trading harness, uses this pattern for every event in the pipeline. Signal generated, risk decision made, announcement sent to Discord, cancel issued, order placed, fill received, order rejected — each one is a new row in audit_log. The current portfolio state is a projection. P&L attribution at the trade level is a query against the log.

The PayTable backend uses a narrower version for sensitive state transitions: payment status changes, order modifications, refunds, waiter-call events. The rest of the application is normal mutable CRUD, because the rest of the application doesn’t need this level of accountability. The pattern is applied where it pays off, not everywhere.

The home automation platform uses Home Assistant’s built-in state history, which is structurally similar — every state change of every entity is logged with a timestamp, and the current state is computed from the latest row. This is how you can ask “when did the living room thermostat hit 24°C yesterday” and get an answer that’s accurate to the second.

The implementation patterns that matter

A few things to get right if you’re adopting this pattern:

Make the log fast. Append-only writes are inherently fast — there are no joins, no indexes-to-update beyond a single primary key — but the projection tables can become a bottleneck if they’re rebuilt from the entire log on every event. Use incremental projection: each new event updates the projection with just that event’s contribution.

Make the schema durable. The log will outlive the application code. Use a schema that doesn’t require migrating old rows when the application changes. JSON payloads with a versioned schema are a common choice; protobuf with explicit field deprecation is another. Anything that requires ALTER TABLE audit_log after the system is in production is the wrong choice.

Make the causation chain mandatory. Every event that’s caused by another event should name its cause. This is the single highest-leverage data point in the log. Without it, you have a list of events. With it, you have a graph — and graphs are queryable in ways lists are not.

Make the log queryable. SQLite is fine for moderate volumes (millions of rows on a Pi 5 — I’m running this in production today). For higher volumes, DuckDB on top of the same storage gives you analytical query speeds without changing the write path. For very high volumes, the log becomes its own service.

Make the log boring. The audit log should be the least interesting part of the system to operate. Writes never fail (append-only is the simplest write path you can have). Reads are fast (cached projections). Schema doesn’t change (durable payload format). The log earns its keep by being there when you need it, not by being clever.

When not to use it

There are three failure modes I’ve seen with this pattern:

Applying it to systems that don’t need it. A CRUD app with no auditability requirements gets nothing from the pattern except code complexity. Don’t reach for it by default.

Applying it incompletely. A log that has some events but not all is worse than no log at all, because you’ll trust it. Either the log is the source of truth for a given event class, or it isn’t. Mixed regimes invite the question “is this state in the log or not?” — which is exactly the question the pattern is supposed to eliminate.

Mutating the log. I’ve seen teams append “correction” events that silently rewrite past events by re-emitting them. This breaks the time-travel property and turns the log into a list of suggestions rather than a record. Corrections are fine, but they should be additive events with a clear event type, not stealth replays.

The summary

For any system where “what was true at decision time” matters more than “what is true now,” append-only event logs are the right primitive. They cost almost nothing in storage. They cost a small amount in implementation discipline. They pay back the first time you need to investigate an incident and the answer is sitting in a SELECT * FROM audit_log WHERE entity_id = ? away.

The audit log is not a backup. It’s not a debugging aid. It’s the source of truth, and the current state is just a useful summary of it. The systems I trust most are the ones built that way from the beginning.