Deploying Software in the Physical World
Deploying to a data center and deploying to a kitchen are different disciplines. The hardware is the easy part. The hard part is the update cycle, the failure recovery, and the fact that a crashed display during dinner rush isn't an incident you resolve with a Slack message.
By Igor Riera
Deploying software to a data center and deploying software to a kitchen are different disciplines. Not different in degree - different in kind.
The mental model that works for cloud deployments - observability pipelines, rolling restarts, SSH access, on-call runbooks - fails in environments you don’t control and can’t physically reach. Building PayTable taught us most of what we know about this gap. What follows is what generalized.
The Deployment Assumption
Every developer has a mental model of “production.” Usually it involves a server somewhere, a deployment pipeline, and the ability to SSH in when something breaks. The feedback loop is fast. Something fails, you get an alert, you investigate.
Physical deployments break this model at every step.
You can’t SSH into a kitchen display mounted above a grill. You can’t restart the service when the person closest to the device has wet hands and is in the middle of a dinner rush. The user is not a developer with a terminal - they’re a cook, or a waiter, or a restaurant manager who doesn’t know what a process is. When the system fails, the signal isn’t a Slack alert. It’s a waiter walking to the kitchen to check if the order arrived.
This changes what “production-ready” means. Uptime isn’t a percentage goal in an SLA document. It’s the difference between a restaurant that keeps using your product and one that switches back to paper tickets after a bad Friday night.
The Update Problem
Shipping a new version to a cloud service takes minutes. Shipping a change to a physical device fleet takes weeks, if you’re not careful, and can brick devices if you aren’t careful enough.
The problem isn’t the mechanism - OTA updates exist, MDM tools exist - it’s the timing and recovery. A cloud container that fails to start gets replaced automatically. A kitchen display that fails to boot after a firmware update stays dark until someone physically intervenes.
For PayTable’s Kotlin app running on Elo devices in kiosk mode, we learned to treat every firmware update as a one-way door. The update ships, the device reboots, and if the new version has a problem the device encounters during that reboot, recovery requires either a working remote management channel or a technician in the room. In a 40-degree kitchen, that technician isn’t always available at the time the problem surfaces.
The rule we arrived at: test in conditions that match the environment, not the lab. Temperature, humidity, network instability - these aren’t edge cases. They’re the baseline. What works clean in a controlled environment breaks in ways that are hard to predict when the ambient temperature changes by 20 degrees or the network packet loss climbs above 5%.
Staged rollouts matter more in physical fleets than anywhere else. Ship to 5% of devices, watch them for a week, then expand. The blast radius of a bad update in a physical fleet is measured in devices you can’t remotely recover.
Designing for Failure Modes You Can’t Observe Directly
Cloud failure modes are mostly visible: a service throws an exception, a container exits, a database connection times out. You get stack traces. You get logs. The system tells you what happened.
Physical failure modes are often silent.
A thermal printer that runs out of paper doesn’t throw an error - it buffers incoming jobs and waits. A device on intermittent WiFi doesn’t announce that it’s dropping packets - it just misses poll cycles. A touchscreen with a degraded capacitive layer doesn’t log touch failures - it just stops responding reliably to wet fingers.
The monitoring work for PayTable’s hardware fleet wasn’t glamorous. Heartbeat checks every 30 seconds. A flag when a printer misses three consecutive polls. Last-seen timestamps for every device surfaced in the admin dashboard. None of it is technically complex - it’s just work that cloud-native monitoring tools don’t do for you, because they weren’t built for devices that live in kitchens.
The principle: for physical deployments, you have to instrument the silence. If a device can fail without telling you, it will, and you won’t know until someone calls.
The Environment Is the Failure Domain
Software systems fail in predictable ways. Hardware fails in the ways its environment demands.
Restaurant tables get wiped down with cleaning chemicals multiple times a day. Humidity in a busy kitchen averages well above what most consumer hardware is rated for. Spills happen. The device that was glued to a table surface last week gets peeled off by a customer’s child and handed to a waiter who doesn’t know what to do with it.
These aren’t exotic scenarios. They’re Tuesday.
What this means for design: the enclosure, the mounting method, and the materials matter as much as the firmware. For PayTable’s dynamic screen stations - $35 pucks fixed to table surfaces that serve a URL payload to trigger the ordering flow - the software was stable within weeks. The mounting took three iterations over months. Adhesive failed in humid environments. Surface-mounted units got knocked loose. The recessed housing solution that eventually brought the replacement rate from weekly to quarterly required understanding the physical context first.
The lesson: when you’re designing for environments you don’t control, the physical constraints aren’t a handoff to a hardware vendor. They shape the product definition.
Remote Observability at Scale
A single restaurant is a manageable problem. You can be on-site when things go wrong. You can physically check devices. You can watch the network traffic.
Two thousand restaurants is a different problem. At that scale, the only things you know about device health are the things your monitoring infrastructure tells you. Which means if your monitoring has a blind spot, that blind spot exists in 2,000 places simultaneously.
We built device health dashboards into PayTable’s admin interface because cloud monitoring tools have no concept of a thermal printer’s poll status or a kitchen display’s SignalR connection state. Those tools are excellent at watching containers and services. They have no model for “this physical device in Cancun hasn’t checked in for six minutes.”
The principle: at scale, you’re not managing devices - you’re managing data about devices. The monitoring is the operations layer. If the monitoring is wrong or incomplete, the operations team is flying blind, and blind operations in a physical fleet means problems surface through customer complaints instead of dashboards.
What This Means for Designing Physical Systems
The PayTable hardware stack has 23 devices per restaurant. Two thousand restaurants is 46,000+ devices, each maintaining connections to a backend, each with its own firmware version, environmental exposure, and failure history.
The software challenge isn’t the backend. The backend is a .NET API on Azure - demanding, but understood. The challenge is that every device is a small, persistent state machine living in an environment you didn’t design and can’t monitor directly unless you built the instrumentation to do it.
If you’re building for physical environments - kiosk deployments, IoT fleets, embedded systems in industrial or commercial settings - the cloud deployment playbook doesn’t transfer cleanly. The principles that do transfer:
Design for the failure you can’t see, not just the one that throws an exception. Make updates reversible or staged, because a bad OTA in a physical fleet is a logistics problem, not a rollback command. Test in the environment, not an approximation of it - the gap between a lab and a 40-degree kitchen is the gap between a clean test run and a bricked device. Build monitoring that understands silence as a signal.
The hardware is not the hard part. The hard part is that the software lives in physical space now, and physical space doesn’t care about your SLA.