·6 min read·Deep Dive

What Using AI Coding Agents Daily Actually Looks Like

A practitioner's perspective on running AI coding agents across .NET, Vue, and Python projects — what works, what doesn't, and what actually changed.

By Igor Riera

There’s a lot of noise right now about AI coding agents - what they can do, what they’ll replace, what engineering looks like in two years. Most of it comes from people who’ve tried a demo or run a benchmark.

I’ve been running Claude Code on every project at Cerberus Labs for the past few months: .NET backends, Vue 3 frontends, Python desktop applications, hardware integration documentation. Not as an experiment - as a daily tool and force multiplier, the same way I use an IDE or a terminal.

Here’s what that looks like in practice.

What the Context Layer Looks Like

Before the agent writes anything, it reads. Every project at Cerberus Labs has a structured context layer - architecture decisions, naming conventions, interface patterns, and explicit constraints the agent carries between sessions. Here’s a simplified version of what PayTable’s looks like:

# PayTable — Agent Context
architecture: Clean Architecture, CQRS via MediatR
stack: .NET 9, Vue 3 PWA, Blazor WebAssembly admin, Azure PostgreSQL
payment_processor: T1 Pagos (not Stripe — different API shape, different failure modes)

conventions:
  financial_operations: Advisory locks required, no optimistic concurrency
  audit_logging: Channel<T> pipeline — never Task.Run or fire-and-forget
  real_time: SignalR hub methods use [HubMethodName] attribute
  translations: Auto-detect via Accept-Language, 35+ languages

guardrails:
  never_auto_commit: true
  payment_flows: Human review required, no exceptions
  connection_pooling: MinPoolSize=10, MaxPoolSize=100 — learned this the hard way

This isn’t boilerplate. The Channel<T> line exists because we spent four days debugging connection pool exhaustion caused by fire-and-forget logging. The connection pooling values exist because our first load test came back with a P50 of 21 seconds. Every line in this context file is a scar from production tests.

The agent reads this before it touches any code. When it generates a new command handler, it uses advisory locks for financial operations because the context tells it to - not because it figured out the right concurrency strategy on its own.

The Productive Sessions

The productivity gains are real, but they’re concentrated.

Refactoring across multiple files. When you can describe a pattern - “every query handler in this project follows this session management pattern, add two new ones that mirror it” - the agent does it accurately because the convention is already established in the codebase. It reads the existing code, matches the style, and produces something that looks like you wrote it.

Writing tests. Probably the single biggest time save. I recently added 9 tests across two test files for the content manager’s database layer - fixture setup, assertion style, naming conventions all matched the existing suite. Every test passed on the first run. That’s not magic, it’s pattern matching on a well-structured codebase. The agent is only as good as the conventions it can read.

Codebase exploration. When you’re working with an unfamiliar area of a large project, the agent can read dozens of files, trace references, and summarize what it finds faster than you’d do it manually. I use this constantly when jumping between the .NET backend and the Vue frontend - two different languages, two different paradigms, and the agent bridges the context gap in seconds.

Boilerplate with conventions. Database queries, API endpoint scaffolding, SignalR hub wiring - anything where the pattern is established and you’re adding another instance of it. Grunt work that used to take 15 minutes of copy-paste-modify. Now it takes a description and a review.

What It Still Can’t Do

Architectural decisions. The agent can implement a pattern. It can’t tell you whether the pattern is right for your situation. It doesn’t know your business constraints, your team’s capabilities, or why you chose Clean Architecture over vertical slices. If you don’t have that clarity yourself, the agent will confidently build the wrong thing.

Knowing when to stop building. We shelved two of our three products at Cerberus Labs - not because the engineering was wrong, but because the market moved. No AI agent would have told us to stop building a financial modeling platform when advanced LLMs started doing the same analysis conversationally. That’s judgment, not code generation.

Performance tuning. The agent can fix a bug you’ve identified. It struggles with the detective work of figuring out why your P50 latency jumped from 160ms to 21 seconds. The bottleneck in our case was fire-and-forget Task.Run calls compounding database connection pool exhaustion - something that only shows up under concurrent load and requires understanding the interaction between EF Core retry strategies, PostgreSQL advisory locks, and connection pooling. The agent didn’t find that. Methodical load testing did.

Anything involving money. Payment flows get a review. Every time, no exceptions. The agent can draft the code, but the responsibility for correctness in a payment flow processing real transactions through T1 Pagos in Mexico can’t be delegated to a probability model. The cost of a subtle bug in a financial path is measured in chargebacks and compliance violations, not failed tests.

What Actually Changed

The most honest framing isn’t “AI writes code now.” It’s that the economics of small teams shifted.

I run Cerberus Labs as a solo founder with contract support. The work I ship - a load-tested .NET backend processing payments in Mexico, a Vue PWA serving menus in 35 languages, Python desktop applications with full test coverage, hardware integration across Elo touchscreens and Star thermal printers - used to require three or four people.

The agent handles the mechanical parts fast enough that one person can maintain velocity across multiple codebases without burning out on repetitive work. I focus on the decisions that actually require experience - which payment processor to integrate, how to handle inclusive tax calculations, whether to use advisory locks or optimistic concurrency for financial consistency. The agent handles implementation within patterns I’ve already established.

The skill floor went up. An experienced engineer with these tools moves significantly faster. An inexperienced engineer with the same tools ships bad code faster. The tools amplify what you already know.

The Curator Framing Is Wrong

There’s a narrative that engineers are becoming “curators” - reviewing AI output instead of writing code. That framing misses the point.

You’re still architecting. You’re making the decisions that determine whether the system holds up when 2,000 restaurants hit your SignalR hub simultaneously, whether the payment state machine handles SPEI settlement correctly, whether the data model supports what the business needs next quarter. The agent handles execution within guardrails you set. Without the right guardrails, the output is confident and wrong.

The role didn’t change. The tools got better.

What I’d Tell Someone Starting

Use it. Don’t be precious about it. Don’t outsource your judgment.

Set up your codebase with clear conventions - consistent test patterns, established architectural boundaries, well-named functions. The agent performs dramatically better when it has good examples to follow. A messy codebase produces messy AI output, and cleaning up after it costs more than the time you saved.

Review everything that touches money, auth, or data integrity. Review architectural decisions, not syntax. Keep writing code yourself - the day you can’t evaluate what the agent produces is the day the tool becomes a liability.

The technology is useful. I ship more with it than without it. No revolution, no replacement - just better tools in experienced hands.