Firmware

Ghost in the Shell: Validating Firmware Logic with Mock Sensor Pipelines

Abhinav Singh

04 Feb 2026 — 8 min read

Introduction: The Paradox of Hardware-Dependent Firmware

In the intricate world of embedded systems development, engineers face a persistent and often debilitating paradox: firmware is explicitly designed to control and react to physical hardware, yet the dependency on that very hardware often becomes the primary bottleneck in software velocity. This creates a "Hardware Loop" latency that stifles innovation. To validate a simple logic change, a developer might need to flash a device, physically manipulate a sensor, observe a result, and repeat.

This friction is particularly acute in systems involving complex, high-throughput sensor pipelines such as thermal imaging cameras, LiDAR arrays, or proximity-triggered edge computing devices. In these systems, the "Happy Path" (everything working perfectly) is easy to test. However, the "Edge Cases" sensor timeouts, bus contention, buffer overflows during rapid bursts—are notoriously difficult to reproduce physically. How does one reliably wave a hand in front of a proximity sensor exactly 4 milliseconds after a flash memory write cycle has begun? The human hand is simply not that precise.

Consequently, firmware often ships with "Heisenberg" bugs defects that disappear when you look for them or try to reproduce them slowly, but appear disastrously in the field under chaotic, real-world conditions.

This article details a specific methodology employed to validate a high-performance heterogeneous imaging pipeline. By decoupling the business logic from the physical world and constructing a "Ghost" in the shell a software-defined simulation of external physical events we were able to validate memory safety, concurrency, and logic flow under conditions that would be nearly impossible to reproduce with physical inputs alone.

The System Under Test: A Multi-Stage Asynchronous Pipeline

To understand the necessity of this approach, we must first abstract the system in question. The device was a dual-sensor imaging unit designed for edge deployment. It featured a high-resolution thermal sensor and a visual camera sensor. The system was architected to operate in a high-speed "Burst" mode, capturing rapid sequences of data, buffering them in volatile memory, offloading them to non-volatile storage, and eventually transmitting them to a host via a high-speed bus.

The pipeline consisted of four distinct, asynchronous, and competing stages:

Trigger: An external, asynchronous event (interrupt from a proximity sensor) initiates the sequence. This is the "start gun."
Acquisition: The parallel capture of thermal matrices and visual frames. This involves strict timing constraints; the thermal sensor requires a rigid refresh rate, while the visual camera pushes megabytes of data through a direct memory access channel.
Storage: A high-speed memory operation moving data to a circular buffer, followed by a lower-priority background task that "drains" this buffer to persistent storage.
Transmission: The asynchronous packetization and transmission of the stored data.

The complexity of this system lies not in the individual stages, which are well-understood, but in their intersection. What happens if a new Trigger arrives while the Storage system is 95% full? What happens if the Transmission task locks the storage for a read operation exactly when the Acquisition task tries to write new data? These "race conditions" are the breeding ground for hard faults, deadlocks, and silent data corruption.

Validating these intersections with a physical sensor is erratic. A developer cannot reliably generate a "Burst" of triggers with the millisecond-level precision required to hit these specific race windows. To truly validate the firmware, we needed to remove the physical world from the equation.

The Philosophy of the Ghost: Internal Stimulus Injection

The core philosophy of our solution was simple but profound: Treat sensor inputs as data streams, not physical obligations.

In a well-architected firmware codebase, the "Driver" layer is responsible for translating physical interrupts into system events. The "Logic" layer consumes these events. The Logic layer does not care where the event came from; it only cares that the event occurred.

By introducing an "Injection Point" at the seam between these layers, we could feed the Logic layer with synthetic events that were indistinguishable from real physical triggers. We called this the "Ghost" strategy.

The Random Trigger Task

We implemented a dedicated task within the real-time operating system. This task served as the "Ghost" user. It was not a unit test running on a PC host; it was a living, breathing task running on the actual microcontroller, competing for CPU cycles and bus access just like any other task.

Its sole responsibility was to simulate the behavior of an erratic, sometimes aggressive, external environment. The task operated on a simplified but highly configurable state machine:

The Wait: The task would sleep for a random interval, typically ranging from 0 to 20 seconds. The randomness was crucial. A fixed interval allows the system to settle into a rhythm, masking race conditions. A random interval ensures that triggers land at arbitrary points in the system's execution capability—sometimes while idle, sometimes while busy, sometimes while crashing.
The Actuate: The task would fire a simulated "Proximity Trigger" event. This wasn't just a function call; it was a full emulation of the interrupt handler's downstream effect.
The Observation: It would then monitor the global system state—checking atomic flags and mutexes—to verify if the system reacted correctly (acknowledged the trigger) or incorrectly (ignored it or crashed).
The Signal: Crucially, the task used the device's physical status LEDs to report its virtual status, creating a visual feedback loop for human observers.

This decoupling allowed us to run the device in a "Headless" mode. We could leave the device on a desk overnight, connected to power but with no physical movement around it, and wake up to find it had processed thousands of capture cycles. This effectively simulated weeks of real-world usage in a single night.

Deep Mocking: Synthesizing the Protocol, Not Just the Signal

One valid criticism of mocking is that it often skips the complexity of the data packet itself. Real sensors don't just say "Trigger"; they send metadata—timestamps, validity flags, signal strength, and checksums. A naive mock that just calls a start function misses the vulnerability of the payload parsing logic.

To address this, our Ghost implementation went deeper. It didn't just trigger the logic; it constructed a full Command packet, identical to what a remote controller or a complex sensor hub would send.

We utilized the system's binary serialization library to generate valid, binary-compatible command payloads.

Mock Command IDs: The Ghost generated unique, incrementing identifiers for each trigger. This allowed us to trace a specific trigger through the logs. If the Ghost fired Trigger #5044, we could look in the filesystem and verify that the corresponding image file existed. This closed the loop on data integrity.
Mock Timestamps: The Ghost injected timestamps that allowed us to measure "Time of Flight" for the data processing pipeline—calculating exactly how long it took from the "Virtual Hand Wave" to the "File Closed" event.

By injecting these full packets into the command queue, we were testing the entire stack: the deserialization layer, the command dispatcher, the queue depth management, and the logic handler. We were fuzzy-testing our own internal protocols.

Validating the "Happy Path" and the "Chaos"

The true value of the Ghost inputs became apparent when we adjusted the aggression of the simulation.

The Happy Path Validation

Initially, we configured the random interval to a comfortable duration. This allowed the pipeline to fully flush between triggers. The Capture would restart, the Buffers would drain, and the Radio would transmit.

Success Verification: The Ghost task had access to the pipeline's completion flags. When the logic reported "Pipeline Complete," the Ghost fired a success command.
Visual Confidence: To the observer, the device sat on the desk, periodically blinking a specific color for the trigger, then another for success. This provided immediate, glanceable confidence in the system's baseline functionality. It was a heartbeat monitor for the code.

The Chaos Mode: Hunting Race Conditions

Then, we reduced the interval. We allowed the randomizer to pick near-zero delays. We allowed it to trigger "Bursts" of events in rapid succession. This is where the Virtual world caught bugs the Physical world missed.

Buffer Exhaustion: We discovered that if a trigger arrived exactly when the circular buffer was wrapping around, and the write operation was lagging by a few milliseconds, the Direct Memory Access controller would fail to allocate a slot. In the physical world, this looked like a "glitch." In the Ghost world, it was a reproducible "Buffer Overflow" error log. We were able to tune the buffer sizes and watermarks based on this data.
Bus Contention: The Ghost revealed a bus contention issue. The Thermal sensor necessitated a monopoly on the communication bus during its readout. However, external triggers were causing the Proximity driver to wake up and poll. The Ghost task allowed us to identify exactly when the contention occurred. We implemented a "Zone of Silence" a software interlock where the Mock task mimicked the hardware behavior of muting the proximity sensor during thermal acquisition.
Logic Fallthroughs: Perhaps the most insidious bug found was a simple logic error in the status command handler. A "Success" indication would occasionally be followed instantly by a "Busy" indication. This was confusing to users. The Ghost task, by firing these completion events in a predictable, high-frequency stream, made the pattern obvious: the code was missing a conditional flow statement, allowing the execution to fall through from "Success" to "Busy Checks." A human waving a hand every few minutes would never have noticed the correlation.

Visual Feedback as a Debug Interface

One of the most effective aspects of this mock implementation was the repurposing of the hardware status LEDs as a real-time debug interface.

In a typical production environment, LEDs are for the end-user. In our Mock environment, they became our console.

Trigger Color: "The Ghost has pressed the button."
Success Color: "The Logic Layer reports success."
Inhibited Color: "The Logic Layer reports it is busy/inhibited."
Error Blink: "The Logic Layer rejected my stimulus (e.g., Backpressure active)."

This allowed developers to debug complex state machine interactions without hooking up a hardware debugger. We could "see" the backpressure algorithm kicking in. If the LEDs started blinking the error color during a simulated storm, we knew our flow control logic was working correctly to protect the memory heap.

The "Scaffolding" Concept in Firmware Architecture

This approach highlights a broader engineering principle often overlooked in embedded development: Scaffolding Code is Production Code.

Often, test code is treated as second-class citizens—scripts hacking together external tools, or temporary files that are discarded. However, the random trigger task and its associated helper functions were written with the same rigor as the flight software. They adhered to the project's variable naming conventions, used the standard logging macros, and were integrated into the system's initialization sequence.

This rigorous scaffolding meant that:

It Could Be Reviewed: Other engineers could read the test logic and understand exactly what was being tested. It served as documentation for the expected system behavior.
It Could Be Configured: We could change the burst size, the random seed, or the payload type via simple preprocessor directives, allowing us to pivot from "Stability Testing" to "Stress Testing" in seconds.
Legacy of Reliability: Although the Mock task is disabled for the final production binary, it remains in the codebase. It effectively "ships" with the source. If a field issue arises months later, we can re-enable the Ghost, configure it to reproduce the field condition, and debug it on a desk without needing to fly to the customer site.

Conclusion

The "Ghost in the Shell" technique mocking sensor inputs via internal software tasks—transformed the validation phase of the firmware project. It moved us from physical, anecdotal testing ("It seems to work when I wave my hand") to rigorous, automated logical verification.

By injecting simulated events at the architectural seam between the Driver and Logic layers, we validated the entire downstream pipeline: buffering, storage management, filesystem integrity, and transmission protocols. We found and fixed race conditions that were statistically unlikely to happen in a manual test session but statistically guaranteed to happen across a fleet of thousands of devices.

Ultimately, this approach proves that in modern firmware development, the most effective way to validate the hardware-dependent code is, paradoxically, to remove the hardware from the equation entirely. By verifying the logic in a pure, controlled simulation, we ensure that when the real world finally knocks on the door, the system is fundamentally ready to answer.