Skip to content
Event Sourcing 9 min read

Five Event Sourcing Pitfalls That Bite You After the Tutorial

Five production realities that event sourcing tutorials skip: honest event design, cross-aggregate projections, versioning, snapshots, and infrastructure.

Understanding the pattern is the easy part. Here is what actually trips up teams shipping real systems.


You have read the articles. You understand aggregates and events. You might have even built the obligatory todo app with event sourcing and thought, yes, this makes sense. Then you start building something real, and everything gets strange. Not because the pattern is wrong, but because there is a sizeable gap between the elegant theory and the messy, durable, production-grade system you actually need to ship. This post covers the five things that gap contains, written by someone who has fallen into all of them.


1. Events are your API to the future, not just your current domain model

Most tutorials treat events as implementation details. They are not.

An event is an immutable fact. Once it is written, it is written. Your system will read that event for years, possibly decades, and it had better still mean something when it does. But the tutorial version of event sourcing tends to model events as a convenient way to rebuild aggregate state, and that framing quietly bakes in some bad habits.

The most common one: naming events after the state change rather than what actually happened. UserUpdated with a JSON diff payload is not an event. It is a mutation log with extra steps. It tells you the state changed but nothing about why, or what specifically occurred. Compare that to UserEmailChanged or UserSubscriptionUpgraded. Those events mean something. They carry intent. They are honest about what happened in the domain, which makes them genuinely useful as facts years later.

Granularity matters too. An OrderPlaced event that contains the entire order in a flat struct feels convenient until the day you realise that “placing an order” in your domain is actually three distinct things: a customer commitment, a payment authorisation, and a warehouse reservation. They have different failure modes, different projections that care about them, and different reasons to change. One event cannot model all of that cleanly.

The rule of thumb I keep coming back to: if you cannot say “X happened” where X is a specific, meaningful business fact, you have not found the right event yet. Events should be honest about what happened, not convenient for today’s read model.


2. Projections: from simple aggregations to the query you actually need

The tutorial projection is a single aggregate rebuilt from its own events. That is a legitimate use case and a good starting point. It is also nowhere near what production systems need.

Real queries cross aggregate boundaries. A “recent orders with customer details and fulfilment status” query needs events from order streams, customer streams, and fulfilment streams. A “failed payment attempts per user in the last 30 days” query needs a read model that spans thousands of events across thousands of streams, filtered, grouped, and pre-computed.

The moment you accept that read models are independent artefacts built from events, you start thinking about projection design very differently.

// The tutorial version: rebuild an aggregate from its own stream
public OrderState Apply(OrderState state, IEvent @event) => @event switch
{
OrderPlaced e => state with { Status = "Placed", Total = e.Total },
OrderShipped e => state with { Status = "Shipped", TrackingNumber = e.TrackingNumber },
_ => state
};
// In production, you also have cross-aggregate projections like this
public class RecentOrdersProjection
{
public async Task HandleAsync(OrderPlaced e, string customerId)
{
// Writing to a denormalised read model optimised for
// "give me the last 20 orders for a customer with their status".
// This query shape will never come from rebuilding an individual aggregate.
await _readStore.UpsertAsync(new RecentOrderEntry
{
OrderId = e.OrderId,
CustomerId = customerId,
PlacedAt = e.OccurredAt,
Total = e.Total,
Status = "Placed"
});
}
}

The projection complexity curve is steep. Early on, you are rebuilding single aggregates. A few months in, you have cross-aggregate read models. A few months after that, you have read models that need to react to events from completely different bounded contexts. Each step is manageable, but each step also requires that your projection infrastructure can handle failures, restarts, and replay cleanly.

What eventual consistency actually means in practice: your read model is behind. By how much depends on your infrastructure and event volume. Most of the time this is fine. Occasionally it is not, and you need to know which queries are sensitive enough to require strong consistency and handle those cases explicitly, not by ignoring them and hoping for the best.

Design your read models deliberately, based on the query shapes your application actually needs. The teams that end up with painful projection rewrites are almost always the ones who started with aggregate-shaped projections and tried to retrofit them to query-shaped requirements six months later.


3. The event versioning time bomb

Events are immutable. Your understanding of the domain is not.

At some point, AddressChanged needs a new field. At some point, you realise OrderPlaced was quietly carrying three different business scenarios in a single event type and needs to be split. At some point, a field name that made perfect sense in month one is actively misleading in month twelve.

This is the event versioning problem, and every event-sourced system hits it eventually. The teams that handle it well are the ones who thought about it before they needed to.

The main strategies:

Upcasters. When loading old events, transform them to the current shape in memory. Event version 1 of OrderPlaced gets upcasted to version 2 before your reducer sees it. The original event on disk is untouched. This is clean and safe for additive changes: adding a field, renaming a field, changing a type with a known mapping.

Co-existing event types. Instead of versioning OrderPlaced to OrderPlaced_v2, you introduce a new event type: OrderPlacedWithPromotion, or you split into OrderSubmitted and OrderConfirmed. Old events stay old. New events use the new type. Your reducer handles both. This scales well for substantive domain model changes where the old and new concepts are genuinely different.

Migrations at read time. Your projection code carries the transformation logic. Old events produce one read model shape, new events produce another. The migration lives in the projection, not in the event schema. This keeps the event store clean and pushes versioning responsibility to the consumer.

The trap most teams fall into: ignoring this entirely until a real schema change forces the issue. By then you have hundreds of thousands of events in the store, running systems depending on the current shape, and no clean migration path. Thinking about event versioning on day one costs you almost nothing. Thinking about it six months in, under pressure, costs you a week of careful archaeology and a release you did not want to make.


4. Snapshot strategies that do not create new problems

Replaying thousands of events on every aggregate load is not viable. A mature event stream with thousands of events per aggregate will hurt your latency badly once you try to rebuild state on every command.

Snapshots solve this. The aggregate state is serialised and stored at a checkpoint, and replay starts from there. In principle, simple. In practice, snapshots introduce their own class of bugs.

The most common snapshot failure mode: a bug in the aggregate logic quietly poisons your snapshots. The replayed state was wrong, the snapshot captured the wrong state, and now every subsequent load from that snapshot inherits the error. Without full replay, you cannot detect it. With full replay, you have lost the performance benefit.

The safer approach is to treat snapshots as a cache, not a source of truth. Store them, use them, but never abandon the ability to rebuild from zero. Your snapshot store should be a separate concern from your event store, with its own lifecycle: snapshots can be invalidated and rebuilt without touching the event log.

public async Task<OrderAggregate> LoadAsync(string orderId)
{
var snapshot = await _snapshotStore.GetLatestAsync(orderId);
var fromVersion = snapshot?.Version ?? 0;
var state = snapshot?.State ?? OrderAggregate.Initial;
var events = await _eventStore.ReadStreamAsync(
streamId: $"order-{orderId}",
fromVersion: fromVersion
);
return events.Aggregate(state, (current, e) => current.Apply(e));
}

Snapshot schema mismatches are the other common failure. Your aggregate evolves, your snapshot shape changes, but you have old snapshots on disk serialised against the old shape. The deserialiser fails, or worse, silently produces a zero-value aggregate and you replay forward from an invalid starting state.

The rule: version your snapshots independently from your events. If the snapshot shape changes, invalidate old snapshots and rebuild from the event log. The cost of a one-time full replay is always lower than the cost of a corrupted aggregate silently serving wrong answers in production.


5. The infrastructure overhead most teams underestimate

To run event sourcing properly in production, you need:

  • An append-only, ordered, durable event store with stream-level optimistic concurrency
  • A projection runner that tracks checkpoint positions per projection, survives restarts, handles partial failures, and can replay from any point in history
  • A streaming layer for real-time event consumers that need low latency
  • Snapshot storage with its own lifecycle and invalidation strategy
  • Monitoring that understands event-sourced systems specifically: projection lag, event throughput, checkpoint staleness

Most teams discover this list late, after they have already committed to the pattern.

This is not a criticism of event sourcing. It is a completely honest description of what the pattern requires to run well. Relational databases come with this infrastructure pre-built, battle-tested, and documented for decades. Event sourcing does not. You are either building it, buying it, or using something that provides it.

The build cost is not lightweight. Getting the projection runner right specifically, the failure handling, the checkpoint strategy, and the replay behaviour, is a meaningful engineering investment that compounds with every projection you add. It is not a one-time investment either. You will maintain it. It will have edge cases. When something goes wrong at 2am in production, you will debug it, and you will be glad or sorry you built it well.

This is where the infrastructure-versus-business-logic line becomes concrete. Every hour spent on the event store infrastructure is an hour not spent on the thing your customers actually use.


What to do about it

None of these pitfalls are reasons to avoid event sourcing. The pattern itself is sound, and the second-order benefits compound over time: complete audit history, temporal queries, event-driven integration, the ability to replay and rebuild any read model from the ground up. Teams that commit to event sourcing properly tend to end up with systems that are genuinely easier to reason about and evolve.

The pitfalls are all in the surrounding machinery, not the concept. Teams that succeed either invest heavily in building and maintaining that machinery themselves, or they use something that handles it for them. Neither choice is wrong, but both should be made explicitly, not by accident.

Go in with eyes open. Think about event schema design early. Design your read models based on the queries you actually need, not the aggregates you happen to have. Plan for event versioning before you need it. Treat snapshots as caches. Budget honestly for the infrastructure.

Event sourcing rewards the teams who commit to it properly. The pitfalls in this post are not obstacles; they are the known terrain. Now you have a map.


If you want event sourcing without owning the infrastructure, the Hapnd beta is open at hapnd.dev. Push your reducers and projections. Hapnd handles storage, streaming, snapshots, and scaling. No infrastructure to configure or maintain.