Two-Tier Sensor Queue: Surviving BLE Dropouts at 100Hz

Two-Tier Sensor Queue: Surviving BLE Dropouts at 100Hz

A pet wearable streams six-axis motion data at 100 Hz to a home hub over Bluetooth Low Energy. The dog walks into the next room. The wall absorbs more RF than the supervision timeout can tolerate, and the link drops. Twelve seconds later the dog walks back out and the link comes up again.

What happened to the 1,200 motion samples in flight during those twelve seconds?

If the firmware was streaming naively, those samples are gone — and the activity timeline a veterinarian later reads has a 12-second hole right where the dog did something. This post walks through the two-tier persistent queue we built to make that hole impossible: a RAM ring buffer in front of a flash-backed spill file, governed by a three-mode state machine, with strictly chronological FIFO replay when the link returns.

Problem

Wearables don't get to choose where their owner walks. Even in an apartment, BLE link supervision tears down a connection after a few hundred milliseconds of consecutive missed packets — much shorter than the time a pet spends behind a couch.

The naïve options all fail the same way:

  • Drop on disconnect. Easy to implement. Destroys the dataset for any analysis — gait, sleep, activity bouts — that needs continuity.
  • Block the producer when BLE is down. Holds up the IMU task. Drops samples elsewhere in the pipeline, usually with worse failure modes than the explicit drop above.
  • One large single-tier RAM buffer. Works for a few seconds. Then either you size it for the 99th-percentile outage and pay the RAM cost, or you blow up under real-world conditions.

The right pattern is a two-tier queue: a fast in-RAM ring for in-flight frames, plus a persistent flash file that holds everything the ring can't immediately deliver. A mode state machine arbitrates who reads the ring and where the data goes next.

Crucially, this is not just a buffer — it's a timeline-preserving buffer. When the link comes back, the cloud must see frames in the same order the sensors recorded them, or downstream models that look at sequences fall apart.

Approach

The design uses three storage stages and a strict state machine:

Tier 1 — RAM ring buffer. A FreeRTOS MessageBuffer at 192 KB. Producers (IMU sample-batcher, audio capture, barometer) push frames here unconditionally. Writes are an O(1) memcpy; nothing in the ring can block on flash I/O.

Tier 2 — Append-only flash file. Lives on the external flash filesystem. Survives reboots. Each frame is written with dual length tags — a uint16 length prefix before the payload AND an identical uint16 length suffix after it. The duplicate tag is the load-bearing design choice; with it, a reader can walk the file head-to-tail OR tail-to-head with no index file.

Three-mode state machine governs who reads the ring and where the data goes:

// file_operations.c:154
static op_mode_t compute_mode(void)
{
    /* BLE down: only path to preserve frames is the flash spill. */
    if (!s_ble_up)              return MODE_SPILL;
    /* BLE up + persisted backlog: send oldest persisted data first
     * (FIFO walk — see drain_replay).                            */
    if (spill_size() > 0)       return MODE_REPLAY;
    /* BLE up + no backlog: stream ring → BLE directly.           */
    return MODE_LIVE;
}
  • LIVE — BLE up, spill empty, ring under 75% capacity. Drain task ships frames from the ring directly over BLE.
  • SPILL — BLE down (or ring ≥ 75% as back-pressure relief). The spill task is the sole ring consumer; everything it pulls goes into the flash file.
  • REPLAY — BLE up, but the flash file still holds backlog. Producer writes continue to spool into the file (preserving order); the drain task pulls frames out of the file and onto BLE.

The state machine guarantees a single consumer of the ring at any instant, and a single reader of the flash file during REPLAY. That property is what makes the design safe under sustained throughput.

Process

Two implementation details are worth calling out because they're where the design either succeeds or quietly leaks data.

One write() per frame

The instinct is to write the leading length tag, then the payload, then the trailing tag — three write() syscalls per frame. Through the VFS layer of a flash filesystem, each call grabs an internal mutex; three writes per frame turned the spill task into the bottleneck.

// file_operations.c:343
static void spill_one_frame(const uint8_t *buf, size_t got)
{
    /* Build the framed bytes (header + payload + trailer) in one
     * contiguous buffer so we do exactly ONE write() syscall per frame
     * instead of three. FATFS goes through VFS and grabs an internal
     * mutex on every write — three writes per frame turned the spill
     * task into the bottleneck (~33 fps, vs producer rate ~95 fps for
     * audio bursts), causing sustained ring overflow. With one
     * write() per frame the spill task stays comfortably ahead. */
    static uint8_t framed[MAX_FRAME_BYTES + 4];
    ...
    framed[0] = (uint8_t)(got & 0xFF);
    framed[1] = (uint8_t)((got >> 8) & 0xFF);
    memcpy(&framed[2], buf, got);
    framed[2 + got]     = (uint8_t)(got & 0xFF);
    framed[2 + got + 1] = (uint8_t)((got >> 8) & 0xFF);
    write(s_spill_fd, framed, got + 4);
}

Measured before and after: spill throughput climbed from ~33 frames/sec (bottlenecked on filesystem mutex contention) to comfortably ahead of even audio-burst producer rates around 95 fps. The function-static framed buffer keeps the stack flat; only one writer ever calls into this path so it's race-free.

Writing the length prefix before reading the payload

During replay, frames are packed into a fixed-size batch buffer that's handed to the BLE transport in one shot. The trick is the order:

// file_operations.c:756  (drain_replay, FIFO forward walk)
size_t prefix_at = batch_used;
batch[batch_used++] = (uint8_t)(flen & 0xFF);
batch[batch_used++] = (uint8_t)((flen >> 8) & 0xFF);
if ((size_t)read(s_replay_fd, &batch[batch_used], flen) != flen) {
    ESP_LOGE(TAG, "replay: payload read failed");
    batch_used = prefix_at;
    break;
}
batch_used += flen;
batch_frames++;
s_replay_pos = next_pos;

We write the 2-byte length tag into the batch buffer first, then read the payload directly behind it. If the read fails, the saved prefix_at offset rewinds batch_used cleanly so the partial frame doesn't pollute the next send. The forward-walk also exploits a small but real saving: reading the leading tag already advances the file descriptor to the start of the payload, so no additional lseek is needed per frame.

Results

The state machine plus this framing gives us a few non-obvious properties:

  • No frames are lost across typical (minutes-long) BLE outages. The flash spill scales far beyond realistic in-home dropouts.
  • Replays are chronological by default. We deliberately walk the file head-to-tail so the cloud reconstructs the timeline in the same order the sensors recorded it. The dual-tag framing means flipping to tail-first ("show me what the pet is doing right now") is a three-line change, because the file is walkable in either direction at constant cost.
  • Survives reboots. Because tier 2 is a real filesystem file, a brown-out or watchdog reset during an outage simply leaves the file on disk; the next boot picks up replay from where it left off, in order.
  • One consumer at a time. The state machine guarantees the ring has exactly one drain task and the flash file has exactly one reader during REPLAY — no racing readers, no dual-handle filesystem hazards.

The pattern composes cleanly with the rest of the firmware. The same flash file holds any frame the producers can generate — motion, audio, environmental — because the on-disk format is a thin envelope and downstream consumers parse the inner payload themselves.

Why It Matters at Hoomanely

Hoomanely is reinventing pet healthcare through Physical Intelligence — continuous, AI-driven monitoring on devices the pet actually wears. The Biosense AI Engine is downstream of the firmware described here; everything it concludes about a pet's gait, restlessness, sleep, or distress depends on whether the firmware delivered a complete, ordered, time-true record of what the sensors saw.

A model trained on data with random 12-second holes learns the wrong thing. A model that sometimes sees yesterday's frames replayed in reverse learns very wrong things. The two-tier queue isn't a buffer; it's the load-bearing contract between sensors at the edge and intelligence in the cloud.

Every wearable our team ships uses this pattern. It is the difference between a fitness tracker that loses data during the dog walk and a clinical-grade monitor that preserves it — and preserving it is the prerequisite for spotting health changes early, which is the whole point of preventive care.

Key Takeaways

  • A single buffer is never enough. Separate RAM ("fast, lossy on full") from flash ("persistent, slow"). Each tier should know its job and stay in it.
  • A state machine beats flag checks. Three modes, three triggers, one consumer at a time — easier to reason about, easier to test, and far harder to break than ad-hoc bookkeeping.
  • Framing decides walk direction. Dual length tags cost four bytes per frame and buy the freedom to choose chronological replay or freshest-first replay later — without an index file.
  • Measure where you actually lose frames. Our biggest single win came from collapsing three write() syscalls into one — a system-call problem, not a storage problem.
  • The replay path is the rare path. Test it under load. It is the one that bites in production, every single time.

Read more