April 10, 2026 ·7 min read·Deep Dive

What Load Testing a World Cup Audience Taught Us About Scaling

PayTable's first serious load test came back with P50 latency of 21 seconds and 40%+ error rate. Four days later: P50 at 160ms, error rate under 1.3%. Here's exactly what we fixed and why each intervention mattered.

By Igor Riera

There’s a version of load testing where you run a script, watch a graph climb, and feel good about what you see. That wasn’t our first run.

Our first serious load test against PayTable’s backend came back with a P50 latency of 21 seconds and an error rate above 40%. Not P99 - P50. Half of all requests were taking over 21 seconds. A restaurant customer placing a QR order, waiting 21 seconds to see a confirmation, and then probably closing the tab.

We’re targeting 2,000 restaurants for the 2026 World Cup. That number focused our thinking considerably.

What 21 Seconds Actually Means

PayTable is a QR ordering and payments operational platform we run in Mexico. The ordering flow is synchronous from the customer’s perspective: scan, browse, order, confirm. The latency budget for that confirm step is somewhere around 2 seconds before customers start losing trust in the system.

21 seconds isn’t a slow response, it’s a broken product.

The load profile we were testing against wasn’t exotic - concurrent ordering sessions across a simulated restaurant pool, with SignalR connections maintaining real-time state to kitchen displays, printer stations, and client devices for waiters and customers. This is the normal operating state of the platform. The difference between a typical Tuesday lunch service and a busy Friday during a World Cup group stage match is just volume.

At 21 seconds P50, we didn’t have a scaling problem. We had an architecture problem that load testing had finally made visible.

Finding the Real Bottleneck

We started with the obvious suspects: slow queries, N+1 patterns, missing indexes. Those existed and we fixed them, but the latency numbers barely moved.

The actual problem was fire-and-forget Task.Run calls scattered through the audit logging path.

The pattern looked harmless in isolation:

// Audit the order placement - fire and forget
Task.Run(() => _auditService.LogOrderPlaced(order));

Under low load, this works fine. Under concurrent load, it creates a quiet disaster. Each Task.Run captures the current DbContext from the DI container. That DbContext is scoped - it’s tied to the HTTP request lifetime. But Task.Run schedules work outside the request context, on the thread pool, with no structured lifetime.

The result: dozens of DbContext instances running concurrently, each holding a connection from the pool. The pool is finite. New requests arrive, try to acquire a connection, and wait. Then they time out, and they fail.

The connection pool wasn’t undersized. It was being exhausted by unmanaged context leaks from logging paths that were never supposed to touch it under that kind of pressure.

Channel<T> for Audit Logging

The fix was to decouple the audit logging path from the request path entirely.

We replaced the fire-and-forget calls with a Channel<AuditEvent> - an in-memory queue that the request path writes to and a background service reads from:

// In the request handler - non-blocking write to the channel
await _auditChannel.Writer.WriteAsync(new AuditEvent
{
    EventType = AuditEventType.OrderPlaced,
    OrderId = order.Id,
    RestaurantId = order.RestaurantId,
    Timestamp = DateTimeOffset.UtcNow
});

// In a background IHostedService - consumes from the channel
await foreach (var auditEvent in _channel.Reader.ReadAllAsync(ct))
{
    await _auditRepository.RecordAsync(auditEvent, ct);
}

The request path now writes an event object to memory and moves on. The background service owns a single, long-lived DbContext for audit writes, processes events sequentially, and never touches the connection pool that request handlers depend on.

The audit log is slightly delayed relative to the request - by milliseconds, in practice - and that’s an acceptable trade. The request latency impact dropped immediately.

PgBouncer and Connection Pool Tuning

With the unmanaged context leaks resolved, we turned to the connection pool itself.

We run PostgreSQL on Azure Flexible Server. The default EF Core connection behavior creates a new connection per request, relying on Npgsql’s built-in pooling. Under the load profile we were testing, the default pool configuration was too conservative.

We added PgBouncer in front of the database and tuned the connection string explicitly:

Host=...;Database=paytable;MinPoolSize=10;MaxPoolSize=100;Timeout=15;Keepalive=30

MinPoolSize=10 ensures connections are pre-established at startup - the first burst of traffic doesn’t hit cold pool initialization. MaxPoolSize=100 gives headroom for concurrent request peaks. Timeout=15 fails fast if the pool is saturated rather than queuing indefinitely. Keepalive=30 prevents Azure’s network layer from dropping idle connections that PostgreSQL still considers open.

PgBouncer in transaction mode sits between the application and the database, multiplexing application connections onto a smaller set of actual database connections. This is particularly effective for our workload - request handlers hold database connections for short bursts, not long transactions, which is exactly the pattern transaction mode is built for.

Custom EF Core Execution Strategy

EF Core’s default retry strategy retries failed database operations automatically. Most of the time, this is the right behavior - transient connection failures, brief Azure network hiccups, pool exhaustion events that resolve in a fraction of a second.

The problem is advisory locks.

PayTable uses PostgreSQL advisory locks in a few places where we need to serialize access to a shared resource - restaurant seat capacity tracking, payment idempotency checks. An advisory lock transaction looks like this:

await connection.ExecuteAsync("SELECT pg_advisory_lock(@lockId)", new { lockId });
try
{
    // Critical section
}
finally
{
    await connection.ExecuteAsync("SELECT pg_advisory_unlock(@lockId)", new { lockId });
}

If EF Core’s retry strategy fires inside one of these transactions, it tears down and re-establishes the connection, releasing the advisory lock mid-operation. The retry then starts a new transaction without the lock. You’ve now got a race condition in code that was specifically written to prevent race conditions.

The fix is a custom execution strategy that recognizes when it’s inside an advisory lock context and disables retries:

public class PayTableExecutionStrategy : NpgsqlRetryingExecutionStrategy
{
    public override bool RetriesOnFailure => !AdvisoryLockContext.IsActive;
}

AdvisoryLockContext is a simple AsyncLocal<bool> that the lock acquisition code sets. The execution strategy checks it before deciding whether to retry. Inside an advisory lock: fail fast and propagate. Outside: retry as normal.

Scaling Gates

We didn’t jump straight from a fixed architecture to a 2,000-restaurant simulation. We ran the load test at three gates.

50 restaurants. This is where we validated that the connection pool changes and Channel decoupling had actually solved the problems we thought they had. P50 at this scale should be comfortably under 200ms. We hit 160ms.

250 restaurants. This is where secondary issues show up - less common code paths under load, edge cases in SignalR connection management, audit log backpressure under burst conditions. We found and fixed two minor issues here. Latency held.

2,000 restaurants. The World Cup target. At this scale, 38,000 concurrent SignalR connections is a realistic peak - kitchen displays, printer stations, waiter apps, customer apps, and admin dashboards all maintaining persistent connections. Azure SignalR Service handles the connection overhead. Our backend handles the business logic. The separation matters.

At 2,000 restaurants, P50 latency held at 160ms. Error rate: under 1.3%.

One piece of the 2,000-restaurant picture that load simulation doesn’t fully capture: geography. Lunch rush in Mexico City doesn’t hit at the same time as dinner service in Cancun. The demand curve isn’t a single spike - it’s a rolling wave that moves across time zones and meal service windows. We built scheduled autoscaling warmup windows timed to regional meal service hours rather than relying entirely on reactive autoscaling. At the volumes we’re targeting, reactive scaling has a cold-start cost that shows up in P95 latency during ramp events. The scheduled warmup absorbs that.

Architecture Is the Plan

The four days between the 21-second result and the 160ms result didn’t involve rewriting the application. The data model didn’t change. The API surface didn’t change. The core ordering flow didn’t change.

What changed was the lifetime management of DbContext instances in background work, the path between audit events and the database, the connection pool configuration, and the retry strategy behavior inside advisory locks. Four targeted interventions, each addressing a specific failure mode that the load test exposed.

This is what load testing is actually for. Not confirmation that a system works - you can get that from unit tests and staging environments. Load testing is the controlled environment where the assumptions your architecture makes about its own behavior get tested against reality. The 21-second P50 wasn’t a bug. It was the architecture telling us where its assumptions were wrong.

Architecture is the plan; load testing is the truth.