Firmware

LittleFS Tail-Latency for Burst Writes: Commit + Prealloc

Pravin Kumar

20 Feb 2026 — 8 min read

When you first validate a bursty image pipeline on LittleFS, it often looks great: average write times are low, and short tests feel “done.” Then you run a 6–12 hour soak, the flash creeps toward full, and suddenly P95/P99 write latency starts spiking. Frames arrive in bursts, your producer thread blocks longer than expected, and the rest of the system starts paying the price—sensor scheduling jitter, CAN/USB backpressure, dropped frames, and even watchdog risk.

This post is a practical firmware playbook for reducing that long-tail latency without replacing your storage pipeline. The key is to stop treating “write time” as one number and instead make the system explain itself: how much time is spent pushing bytes versus updating metadata, and when garbage collection or compaction quietly steals your latency budget as the device ages and free space shrinks.

From there, we apply two small patterns that change stability dramatically. A commit marker makes each image set “publishable” only when it’s fully persisted (and remains safe under power loss), and best-effort preallocation reduces allocation churn so burst writes don’t repeatedly trigger worst-case GC behavior. The result is a storage path that stays predictably fast—even when the flash is hot, fragmented, and nearly full.

The Problem

LittleFS is designed for embedded flash realities (wear leveling, power-loss resilience, log-structured behavior). That design can create a common tail-latency failure mode for burst workloads:

Bursty producers create many short-lived files (or frequently new extents).
As the filesystem fills, allocation becomes harder, and internal cleaning/compaction becomes more frequent.
Metadata operations (directory updates, rename, close, sync) can suddenly take much longer than the data write itself.
The result: average throughput looks fine, but the tail is unpredictable.

In image pipelines (camera + thermal frames + metadata sidecars), the important performance truth is: Your system doesn’t fail on averages. It fails when the worst 1–5% of writes stall the whole pipeline.

Approach

We’ll take an incremental approach that is safe, measurable, and compatible with existing pipelines:

Measure what matters (and separate it)

Track timing for:

Data write time (payload bytes)
Metadata time (open/close, directory operations, rename)
Sync time (if you explicitly fsync/sync)
GC/compaction signal (inferred from long metadata time, or from block-device counters)

Commit marker pattern for “set correctness”

Write image sets in a way that downstream readers never see partial sets—especially after power loss:

Rename-based commit (.tmp → final) or
Footer-based commit (write a verified footer last)

Best-effort preallocation to reduce churn

Avoid repeated allocation under pressure:

Keep a small file pool (reused slots)
Or reserve a write budget (space discipline)
Or use “pre-created empty containers” that you fill and commit

None of this requires changing LittleFS internals. It’s firmware-level engineering around tail behavior.

Process

Step 1: Add instrumentation that isolates tail spikes

If you only time the whole “write set” call, you’ll miss the real source of stalls. Instead, segment your write path into phases that map to filesystem work.

What to measure

For each image set:

t_open_us
t_write_us (sum of payload writes)
t_close_us
t_commit_us (rename or footer finalization)
t_total_us

Also record:

fs_used_pct (estimated)
set_size_bytes
“mode” (empty FS vs partially full vs near full)

Two metrics that keep you honest

P99 set commit latency (ms)
Sustained sets/min after 6 hours (sets/min)

That’s enough to validate stability without turning your post into a dashboard.

Implementation sketch (minimal overhead)

typedef struct {
  uint32_t open_us, write_us, close_us, commit_us, total_us;
  uint32_t bytes;
  uint16_t fs_used_pct;
} write_trace_t;

static inline uint32_t now_us(void);

#define TRACE_START(var) uint32_t var##_t0 = now_us()
#define TRACE_END(var, out) do { (out) += (now_us() - var##_t0); } while(0)

Use it like:

write_trace_t tr = {0};
TRACE_START(total);

TRACE_START(open);
lfs_file_open(&lfs, &f, path_tmp, LFS_O_WRONLY | LFS_O_CREAT | LFS_O_TRUNC);
TRACE_END(open, tr.open_us);

TRACE_START(write);
lfs_file_write(&lfs, &f, buf, len);
TRACE_END(write, tr.write_us);

TRACE_START(close);
lfs_file_close(&lfs, &f);
TRACE_END(close, tr.close_us);

// commit marker step (rename or footer verify)
TRACE_START(commit);
commit_file(path_tmp, path_final);
TRACE_END(commit, tr.commit_us);

TRACE_END(total, tr.total_us);

How to estimate “fill level”

If you already track blocks used at the block-device layer, use that. Otherwise:

approximate from lfs_fs_size() (if available in your integration) and your configured block count
or maintain a coarse allocator watermark in your own storage service

You don’t need perfect accuracy—just enough to correlate spikes with “near full.”

Step 2: Use a commit marker so readers never see partial sets

Even if you optimize latency, you still need correctness: readers should only consume complete image sets. Tail latency and correctness are tied—if your pipeline retries or the device reboots mid-burst, you don’t want partially written files to be interpreted as “real data.”

Two patterns work well in LittleFS-based systems:

Option A: Rename-based commit

Write everything to a temporary name, then rename to final name when complete:

set_1234.tmp/
commit → set_1234/ (or set_1234.done marker)

Why rename works
Renames are typically metadata operations that LittleFS handles in a power-loss-safe way. The key benefit is semantic: downstream readers only scan final names.

Practical structure

Write image payload files to a temp directory:
sets/set_1234.tmp/cam.bin, sets/set_1234.tmp/therm.bin, sets/set_1234.tmp/meta.json
After all closes succeed, commit:
rename directory set_1234.tmp → set_1234
Optionally create a COMMIT file inside final dir for easy scanning.

Reader rule

Ignore .tmp directories entirely
Only process set_* (final)
If you see set_* but missing expected files, treat as corrupted and quarantine.

Commit function sketch

int commit_dir(const char* tmp, const char* final) {
  // Ensure all files are closed before this point.
  // Rename is your atomic-ish "publish" step.
  int rc = lfs_rename(&lfs, tmp, final);
  return rc;
}

If rename/dir operations are your tail spikes, you can avoid rename and instead make the file self-validating:

Append a fixed footer at the end: {magic, version, length, crc32}
Only consider the set valid if footer exists and verifies.

This works well when you write a single “container file” per set:

set_1234.bin contains camera + thermal + metadata + footer
The footer is written last; if power is lost, verification fails and the reader skips it.

Footer layout example

magic = 0x53455421 (“SET!”)
payload_len
crc32(payload)

Reader rule

Scan file, read footer, verify CRC, then consume.
If invalid footer, skip.

In image-heavy products like EverBowl, commit semantics are what let downstream transfer/analytics stay simple: readers never need “partial write heuristics.” The pipeline can treat storage as a sequence of published sets, even across reboots.

Step 3: Best-effort preallocation to reduce allocation churn

Now we attack the main source of tail spikes in long runs: allocation/compaction pressure.

You don’t need perfect preallocation. You need enough to avoid “panic allocation” under near-full conditions.

Strategy 1: Reusable file pool (most practical)

Instead of creating brand-new filenames forever, create a bounded pool and reuse slots.

Example:

Pool of N set containers: slot_0000 … slot_0255
A small index file maps sequence_id -> slot_id
Each slot is overwritten in a controlled way (like a ring buffer)

Why it helps

Fewer directory entries changing over time
Less allocation churn
Predictable metadata patterns
GC pressure becomes smoother

Implementation idea

Pre-create the slot files once (at format time or first boot)
On each set, pick next slot, write there, commit with footer (or rename within slot namespace)

Strategy 2: Reserved space discipline (simple and effective)

Tail spikes get brutal when you’re writing into the last few percent of free space.

Introduce a rule:

Maintain X% reserved free (e.g., 5–10%)
If below reserve, switch to “degraded mode”:
- drop optional frames
- reduce burst depth
- prioritize commit markers
- or pause writes until upload/transfer frees space

This is not “giving up.” It’s enforcing predictability. Stable systems protect their future self.

Strategy 3: Pre-created directories for burst batches

If your burst writes create many directories, pre-create a limited number of batch directories:

batch_00 … batch_15
Write sets into current batch until it’s “sealed,” then rotate.

This reduces directory-create bursts at the worst times.

Implementation detail: “best-effort” means you never block forever

The whole point of preallocation is to smooth tail latency, so it can’t be allowed to become a new source of stalls. In other words, preallocation is an optimization path, not a dependency. Your write pipeline must still make forward progress even when the filesystem is under pressure, the pool is fragmented, or a slot can’t be claimed quickly.

Treat preallocation like a bounded fast-path:

Attempt to allocate / pick a slot quickly using a strict time budget (or a small bounded retry count). If the pool is healthy, this stays cheap and predictable.
If it fails, immediately fall back to the normal write path (create/allocate as usual). You may take a latency hit in that case, but the system remains correct and doesn’t deadlock or starve the producer.
Record a metric whenever the fast-path misses—for example prealloc_hit_rate, prealloc_fallback_count, and optionally fallback_reason (no free slot, slot busy, reserve threshold reached). This turns “it felt slower today” into an actionable signal: you can tell whether the pool is undersized, whether cleanup is lagging, or whether you’re routinely operating too close to full.

In production pipelines where image sets are uploaded off-device (e.g., for ML validation or post-processing), preallocation makes the on-device behavior boringly consistent during long soak runs—exactly what you want when devices are deployed in homes and you can’t babysit flash state.

Results: How you validate improvements without vanity benchmarks

This is where teams often get stuck: they run a 2-minute test, declare victory, and ship. Tail latency requires a different validation style.

Soak test setup (repeatable and meaningful)

Run a long-duration write test that matches your real burst pattern:

Same set size distribution (camera + thermal + metadata)
Same burst cadence (e.g., 3 FPS bursts, then idle)
Same downstream interactions (if your reader scans storage)

Test at three fill levels:

Freshly formatted
Mid-fill (50–70%)
Near full (85–95%)

What you should expect after these changes

Commit correctness becomes deterministic (reader never sees partial sets).
P99 commit latency becomes much tighter, especially near full.
Overall throughput may remain similar, but the system becomes stable under stress.

If you want one clean target to track:

Improve P99 set commit latency by 3–5× under near-full conditions (typical outcome when allocation churn is the real culprit).

(Exact numbers depend on block size, wear level, and your burst size, but “multi-x tail improvement” is common when you stop fighting the allocator at the worst time.)

Common issues and how to avoid them

Rename is expensive on my device

Use footer commit with container files, or reduce rename frequency by committing per batch instead of per file.

My reader still sees weird artifacts

Enforce strict reader rules:

Only read published names OR verified footers
Quarantine invalid sets
Run a lightweight cleanup task that deletes .tmp artifacts on boot

Preallocation made it worse

That usually means:

You preallocated too aggressively (causing a big upfront stall)
Or your pool is too large and increases metadata load

Fix:

Start small (e.g., pool size that covers 2–5 minutes of worst-case burst)
Grow only if your traces show you need it

We can’t afford `sync` latency

Don’t sync after every write unless you must. Instead:

rely on commit markers for correctness
group sync at safe points (end of burst, or periodic)

Correctness should come from commit semantics—not from “sync everything all the time.”

Key Takeaways

Tail latency is usually metadata + allocation + GC, not raw data writes.
Start by instrumenting phases so you can see what’s actually spiking.
Use a commit marker (rename-based or footer-based) so readers never consume partial sets—even after power loss.
Apply best-effort preallocation (slot pools + reserved free space) to reduce churn and smooth long-run behavior.
Validate with soak tests at near-full conditions and track one or two real metrics (like P99 commit latency + sustained sets/min).

LittleFS Tail-Latency for Burst Writes: Commit + Prealloc

Pravin Kumar

The Problem

Approach

Measure what matters (and separate it)

Commit marker pattern for “set correctness”

Best-effort preallocation to reduce churn

Process

Step 1: Add instrumentation that isolates tail spikes

What to measure

Implementation sketch (minimal overhead)

How to estimate “fill level”

Step 2: Use a commit marker so readers never see partial sets

Option A: Rename-based commit

Option B: Footer-based commit

Step 3: Best-effort preallocation to reduce allocation churn

Strategy 1: Reusable file pool (most practical)

Strategy 2: Reserved space discipline (simple and effective)

Strategy 3: Pre-created directories for burst batches

Implementation detail: “best-effort” means you never block forever

Results: How you validate improvements without vanity benchmarks

Soak test setup (repeatable and meaningful)

What you should expect after these changes

Common issues and how to avoid them

Rename is expensive on my device

My reader still sees weird artifacts

Preallocation made it worse

We can’t afford `sync` latency

Key Takeaways

Read more

Designing a Board That Can Be Reflashed Even When It’s “Dead”

The "Brick-Proof" Bootloader: Designing an A/B Swap Partition

When Alignment Drift Became a System-Level Bug

Proactive Buffer Management: The 80% Rule for High-Throughput IoT Systems

The Problem

Approach

Measure what matters (and separate it)

Commit marker pattern for “set correctness”

Best-effort preallocation to reduce churn

Process

Step 1: Add instrumentation that isolates tail spikes

What to measure

Implementation sketch (minimal overhead)

How to estimate “fill level”

Step 2: Use a commit marker so readers never see partial sets

Option A: Rename-based commit

Option B: Footer-based commit

Step 3: Best-effort preallocation to reduce allocation churn

Strategy 1: Reusable file pool (most practical)

Strategy 2: Reserved space discipline (simple and effective)

Strategy 3: Pre-created directories for burst batches

Implementation detail: “best-effort” means you never block forever

Results: How you validate improvements without vanity benchmarks

Soak test setup (repeatable and meaningful)

What you should expect after these changes

Common issues and how to avoid them

Rename is expensive on my device

My reader still sees weird artifacts

Preallocation made it worse

We can’t afford sync latency

Key Takeaways

Read more

Designing a Board That Can Be Reflashed Even When It’s “Dead”

The "Brick-Proof" Bootloader: Designing an A/B Swap Partition

When Alignment Drift Became a System-Level Bug

Proactive Buffer Management: The 80% Rule for High-Throughput IoT Systems

We can’t afford `sync` latency