Capability study

·

Client confidential

Business intelligence for the machines on your floor

A factory's physical devices were generating valuable operational data, but none of it was being captured in a way anyone could act on. Text message alerts went unread. Downtime dragged on because no one knew a robot had stopped. We built a BI and escalation system for physical devices that gives factory teams the same visibility into their equipment that software teams expect from their infrastructure.

Alerting and escalation

Fleet management

Industrial analytics

Operational dashboards

New test cat

At a glance

Core Problem

No data capture, no visibility, no reliable way to alert humans when machines need attention

Ap role

Architecture, full-stack engineering, data pipeline design

Key technology

Time-series data capture, real-time dashboards, multi-channel alerting

Delivery timeline

Part of a ~6-month fleet management platform build

Client Type

Enterprise manufacturers with mixed robot and device fleets

Domain

Industrial manufacturing, fleet operations

5 min

Mean time to resolution after alert deployment

88%

Reduction in unplanned downtime

100%

Alert acknowledgment rate

What we left behind

Millions in equipment, zero visibility into what it was doing

The factory floor had robots, conveyors, sensors, and autonomous mobile robots (AMRs) generating constant streams of operational data. But the systems were disparate. Some were completely offline. Data was not being captured, aggregated, or analyzed in any structured way. When a robot stopped working, the only notification system was a text message to a single engineer's phone.

In a factory environment, text messages fail. The floor is loud. Workers wear gloves. Phones are often out of reach, locked in a break room, or on silent. A robot could be down for thirty minutes before anyone noticed, and there was no way to know whether the issue had been seen, acknowledged, or resolved. There was also no historical data to identify patterns: which machines failed most often, what time of day, after which operations. Every incident was treated as a surprise, because the infrastructure to learn from them did not exist.

The problem was especially acute for AMRs. Unlike stationary robot arms that fail in predictable ways, mobile robots navigate dynamic environments and encounter a wider range of failure modes: navigation timeouts, obstacle detection false positives, battery degradation, communication drops. Their non-stationary nature makes them inherently less reliable and harder To monitor at scale.

Frontiers engineering in action

PagerDuty for the physical world

Data capture: making the invisible visible

The first step was building the data pipeline. We instrumented the fleet management platform to capture operational telemetry from every connected device: uptime, error codes, performance metrics, environmental readings. This data feeds into a time-series store that supports both real- time dashboards and historical analysis.

The dashboards provide fleet-wide visibility that simply did not exist before. Factory managers can see at a glance which cells are running, which are degraded, and which are down. Drill-down views show individual device health, recent event logs, and trend data. This is the kind of operational intelligence that software teams take for granted through tools like Datadog or Grafana, but that factory operations teams almost never have for their physical equipment.

Escalation: guaranteeing human attention

The more valuable innovation is the escalation system. Inspired by PagerDuty's approach to incident management in software operations, we built a multi-channel alerting system designedfor factory environments. When a device enters an error state, the system does not just send a text message and hope. It follows a defined escalation chain.

The first alert goes out via radio, not text, because factory workers are more likely to have a radio on their person than to be checking their phones. If the alert is not acknowledged within aconfigurable window, it escalates to the next person in the chain. If that person does not respond, it escalates again. The system guarantees that a human being sees and acknowledges every critical alert, and it logs every step so the factory can track responsiveness over time.

Analytics: turning incidents into patterns

With structured data capture and reliable incident tracking in place, the system enables a layer of operational analytics that was previously impossible. Factory teams can measure mean time to resolution (MTTR) across devices, shifts, and facilities. They can identify which machines fail most frequently and correlate failures with specific operational conditions. Over time, this datatransforms maintenance from reactive to predictive.

The key insight from a software architecture perspective is that the core analytics and alerting engine is equipment-agnostic. Although this system was built for robotics, the same data capture, dashboarding, and escalation patterns apply to any fleet of physical devices. The capability extends naturally to non-robotic factory equipment, and the architecture was designed with that breadth in mind.

Frontiers engineering in action

The same rigor software teams demand, built for operations teams

Software engineering teams have spent decades building increasingly sophisticated observability infrastructure: logging, metrics, dashboards, on-call rotations, incident management. Companies like Datadog, PagerDuty, and Grafana exist because software teams refuse to operate blind. But factory operations teams managing millions of dollars in physical equipment often have nothing comparable. They rely on text messages, walkie-talkies, and tribal knowledge.

We built a BI and alerting system for physical devices using the same architectural rigor that the best software operations teams expect, adapted for the specific constraints of factory environments: noisy floors where phones are impractical, shift- based staffing where escalation chains need to rotate, and equipment diversity where a single monitoring framework needs to accommodate robots, conveyors, sensors, and AMRs.

The result was a structural change in how these factory teams related to their equipment. Incidents stopped being surprises and started being data. Patterns emerged. Maintenance shifted from reactive to informed. And the metrics that matter most, like MTTR, became measurable for the first time.

What we shipped

Equipment analytics for all the equipment  
(even equipment they don’t have yet)

Like every engagement, this one ended with the client in full control. No black boxes, no proprietary layers, no forced dependency on us.

Real-time operational dashboards

Fleet-wide and device-level views showing uptime, health, error history, and trend data across all connected equipment.

Multi-channel escalation engine

Radio-first alerting with configurable escalation chains, acknowledgment tracking, and guaranteed human attention for critical events.

Incident analytics pipeline

Structured data capture enabling MTTR measurement, failure pattern identification, and shift level performance reporting.

Equipment-agnostic architecture

Designed to accommodate any fleet of physical devices, not just robots. Extensible to non-robotic equipment without architectural changes.

Capabilities used

Pathfinding

Escalation policy architecture

Factory process observation

Incident workflow design

Requirements definition

Engineering

Multi-Protocol Adaptors

Edge Computing

On-Robot Agents

Low Latency Networking

Secure Tunneling

Babylon.js

React

Analytics and reporting

Radio integration

Escalation engine

Multi-channel alerting

Real-time dashboards

Time-series data pipelines

Design practices

Event-driven Architecture

Fail-Safe Architecture

Graceful Degradation

CI/CD

Clean handoff

Shift-aware scheduling

Equipment-agnostic abstraction

Observability-first design

Insights

Thinking from the frontier.