Firmware

Edge Compression of Telemetry: Delta, Varint, LZ4 & MessagePack

Abhinav Singh

25 Nov 2025 — 12 min read

A technical guide to shrinking IoT telemetry without breaking edge systems or cloud pipelines.

Introduction

IoT and embedded systems generate telemetry that overwhelms traditional networks. Sensor readings, actuator states, health logs, diagnostics, and event traces strain:

Wireless bandwidth
MCU flash wear from local buffering
Cloud ingestion pipelines

At Hoomanely, telemetry is the backbone of our distributed pet-care IoT fleet- device health, feeding analytics, motor performance, battery behavior, and environmental context depend on continuous data flow.

Raw telemetry is expensive to transmit and store. We engineered a compression strategy that's lightweight enough for MCUs, scalable enough for gateways, and simple enough for cloud recovery pipelines.

This guide breaks down four compression techniques Delta encoding, Varint, MessagePack, and LZ4 - how they complement each other, where they should run, and why they form a reliable, maintainable telemetry pipeline.

Why Telemetry Compression Matters

Telemetry from IoT devices is typically:

Predictable - temperature changes gradually per second
Structured - key-value records, protobufs, TLV formats
Repetitive - same fields across messages
Numeric-heavy - integers, floats, timestamps

This makes it ideal for compression. But embedded systems have strict limits:

MCU RAM measured in kilobytes
Limited flash for program and buffer space
CPU budgets shared with real-time tasks
Power constraints on battery-driven nodes

This rules out heavyweight codecs like gzip, zstd, or brotli. Requirements:

Predictable runtime behavior
Low memory footprint
Streaming-friendly
Deterministic and recoverable

This leads to a layered, edge-suitable approach.

Delta Encoding - The First Layer

Delta encoding replaces absolute values with differences between consecutive samples.

Instead of transmitting:

temp: 24.2°C, 24.3°C, 24.4°C

Transmit:

temp: 24.2°C (baseline), +0.1°C, +0.1°C

Why It Works

Most telemetry evolves slowly:

Temperature moves gradually
Battery voltage changes in small increments
Motor current varies within predictable windows
Timestamps increase monotonically

Deltas fall into smaller numeric ranges, compressing more efficiently in subsequent layers.

Key Properties

Very low CPU cost (simple subtraction)
Works with integers and floats
Minimal state requirement (just the previous value)
Bidirectionally deterministic (fully lossless)
No dynamic memory allocation

Implementation

typedef struct {
    float last_value;
    bool initialized;
} DeltaEncoder;

float encode_delta(DeltaEncoder* enc, float current) {
    if (!enc->initialized) {
        enc->initialized = true;
        enc->last_value = current;
        return current;  // First value is baseline
    }
    
    float delta = current - enc->last_value;
    enc->last_value = current;
    return delta;
}

float decode_delta(DeltaEncoder* dec, float delta) {
    if (!dec->initialized) {
        dec->initialized = true;
        dec->last_value = delta;  // First value restores baseline
        return delta;
    }
    
    dec->last_value += delta;
    return dec->last_value;
}
```

Critical Design Consideration

Delta encoding requires correct ordering. If packets arrive out-of-order, reconstruction fails. Solutions:

Use sequence numbers in packet headers
Apply delta encoding only on reliable transport layers
Use monotonic timestamps as implicit sequence markers

Varint Encoding — Squeezing Numbers Further

After Delta, you often end up with small integers — especially with quantized floats or integer sensors (ADC values, counters, state machines).

Varint (variable-length integer) encodes small values in fewer bytes:

Values near zero → single byte
Larger values → more bytes as needed

This fits perfectly with delta values, which cluster around zero when telemetry is stable.

Why Varint is Ideal for Embedded Devices

Branch-light implementation (minimal conditional logic)
No heap allocation required
Streaming-friendly (byte-by-byte processing)
Simple to decode on cloud pipelines
Naturally pairs with protobuf/MessagePack formats

Encoding Mechanism

Varint uses the high bit of each byte as a continuation flag:

If bit 7 is set → more bytes follow
If bit 7 is clear → this is the final byte
Lower 7 bits of each byte carry actual data

Example encoding:

Value: 300 (decimal)
Binary: 0000_0001_0010_1100

Split into 7-bit chunks (right to left):
  0000010  0101100

Encode with continuation bits:
  1010_1100  0000_0010
  (continue) (final)

Result: [0xAC, 0x02] — 2 bytes instead of 4

Implementation

size_t encode_varint(uint32_t value, uint8_t* buffer) {
    size_t index = 0;
    
    while (value >= 0x80) {
        buffer[index++] = (value & 0x7F) | 0x80;  // Set continuation bit
        value >>= 7;
    }
    
    buffer[index++] = value & 0x7F;  // Final byte, no continuation bit
    return index;
}

uint32_t decode_varint(const uint8_t* buffer, size_t* bytes_consumed) {
    uint32_t result = 0;
    uint32_t shift = 0;
    size_t index = 0;
    
    while (true) {
        uint8_t byte = buffer[index++];
        result |= ((uint32_t)(byte & 0x7F)) << shift;
        
        if ((byte & 0x80) == 0) {  // No continuation bit
            *bytes_consumed = index;
            return result;
        }
        
        shift += 7;
    }
}

Compression Effectiveness

Value Range	Standard 32-bit	Varint Size
0 to 127	4 bytes	1 byte
128 to 16,383	4 bytes	2 bytes
16,384 to 2,097,151	4 bytes	3 bytes

For delta-encoded telemetry where most values are small, this provides significant savings.

Varint aims for predictable performance with deterministic worst-case behavior, critical for MCUs running real-time workloads.

MessagePack - Binary Serialization with Built-In Efficiency

MessagePack is a binary serialization format that's significantly more compact than JSON while maintaining similar expressiveness and schema flexibility.

Why MessagePack Matters for IoT

Before applying delta/varint/LZ4, you need to serialize structured data. The format choice fundamentally impacts final payload size and processing efficiency.

JSON:

{"temp": 24.2, "humidity": 65, "battery": 3.7, "timestamp": 1732550400}

Size: Approximately 70 bytes

MessagePack equivalent: Approximately 35 bytes — roughly 50% reduction before any additional compression.

Key Advantages Over JSON

Type preservation: Integers encoded as integers, not strings
Compact encoding: Single-byte type tags for common types
Schema-free: Self-describing like JSON, unlike protobuf
Wide language support: C, Python, JavaScript, Go, Rust, Java
Streaming-capable: Can encode/decode incrementally
No parsing overhead: Direct binary mapping to native types

MessagePack Type Encoding Efficiency

Data Type	JSON	MessagePack
Small positive int (0-127)	ASCII digits	Single byte (value itself)
String (≤31 chars)	Quotes + escaping	1-byte tag + raw bytes
32-bit float	ASCII representation	1-byte tag + 4 bytes
Null	"null" keyword	Single 0xc0 byte
Boolean	"true" or "false"	Single byte (0xc2 or 0xc3)

Format Structure

MessagePack uses a type-tag system where each value begins with a tag byte indicating both type and (sometimes) value:

Positive fixint:  0x00 - 0x7f  (value is in tag itself)
fixstr:           0xa0 - 0xbf  (length in tag, string follows)
fixarray:         0x90 - 0x9f  (count in tag, elements follow)
fixmap:           0x80 - 0x8f  (pair count in tag, keys/values follow)
float 32:         0xca + 4 bytes
uint 8:           0xcc + 1 byte
int 8:            0xd0 + 1 byte

This minimizes overhead for common small values - a value like 42 is literally a single byte: 0x2a.

Practical Example (Python)

python

import msgpack

# Encoding telemetry
telemetry = {
    "temp": 24.2,
    "humidity": 65,
    "battery": 3.7,
    "timestamp": 1732550400
}

packed = msgpack.packb(telemetry)
# MessagePack size is roughly half of JSON

# Decoding
unpacked = msgpack.unpackb(packed)
assert unpacked == telemetry

Embedded C Example

#include <msgpack.h>

void encode_telemetry_sample(uint8_t* output, size_t* output_len, 
                              float temp, uint8_t humidity, uint32_t timestamp) {
    msgpack_sbuffer sbuf;
    msgpack_packer pk;
    
    msgpack_sbuffer_init(&sbuf);
    msgpack_packer_init(&pk, &sbuf, msgpack_sbuffer_write);
    
    // Encode as map with 3 key-value pairs
    msgpack_pack_map(&pk, 3);
    
    // Key: "temp", Value: float
    msgpack_pack_str(&pk, 4);
    msgpack_pack_str_body(&pk, "temp", 4);
    msgpack_pack_float(&pk, temp);
    
    // Key: "humidity", Value: uint8
    msgpack_pack_str(&pk, 8);
    msgpack_pack_str_body(&pk, "humidity", 8);
    msgpack_pack_uint8(&pk, humidity);
    
    // Key: "timestamp", Value: uint32
    msgpack_pack_str(&pk, 9);
    msgpack_pack_str_body(&pk, "timestamp", 9);
    msgpack_pack_uint32(&pk, timestamp);
    
    memcpy(output, sbuf.data, sbuf.size);
    *output_len = sbuf.size;
    
    msgpack_sbuffer_destroy(&sbuf);
}

Where MessagePack Fits

MessagePack should be the first serialization step, applied BEFORE delta/varint, because:

Reduces structural overhead (field names, type indicators, delimiters)
Normalizes numeric types (preserves integer vs float distinction)
Creates denser input for downstream compression layers
Maintains self-describing capability for debugging and recovery
Eliminates text parsing on both encode and decode paths

MessagePack vs Protobuf Trade-offs

Aspect	MessagePack	Protobuf
Schema requirement	None (self-describing)	Required (.proto files)
Flexibility	High (arbitrary structures)	Low (strict schema)
Size efficiency	Good	Better (field number encoding)
Development speed	Fast (no code generation)	Slower (schema compilation)
Debugging	Easier (field names preserved)	Harder (field numbers only)

For IoT telemetry, MessagePack offers the best balance when:

Schema evolution is frequent
Debugging on constrained devices is important
Development velocity matters
Overhead difference vs protobuf is acceptable given other compression layers

LZ4 — Dictionary-Based Compression for Gateways

LZ4 is a high-speed lossless compression algorithm optimized for scenarios where decompression speed matters as much as compression ratio.

It's not suitable for tiny MCUs but excels on:

Edge gateways with ARM Cortex-A CPUs
Linux-class hubs (Raspberry Pi, industrial computers)
SoMs and modules (CM4, AM62, i.MX series)

Why LZ4 for Telemetry Gateways

Extremely fast (GHz-class ARM cores compress at hundreds of MB/s)
Low memory overhead (dictionary fits in L1/L2 cache)
Streaming mode optimized for continuous log-like data
Deterministic output (same input always produces same output)
Frame format with checksums for integrity verification

How LZ4 Works

LZ4 uses dictionary-based compression with backward references:

Scans input for repeated sequences
Encodes repetitions as (offset, length) pairs
Maintains a sliding window of recent data
Uses simple, fast matching heuristics (not exhaustive search)

Example:

Input:  "temperature: 24.2°C, temperature: 24.3°C, temperature: 24.4°C"
Output: "temperature: 24.2°C, <copy 17 bytes from -20>, 3°C, <copy 17 bytes from -20>, 4°C"

The repeated string "temperature: 24." is encoded as a backreference instead of literal bytes.

Why LZ4 Shines on Pre-Processed Telemetry

LZ4 is most effective when fed structured, repetitive data:

Repeated field names (even in MessagePack binary)
Similar message structures across time intervals
Batch processing (multiple messages compressed together)
Delta-encoded values that cluster around zero

This makes it ideal as the final compression layer before network transmission or cloud storage.

Implementation Example (Python)

import lz4.frame

# Compression (at gateway)
def compress_telemetry_batch(messages: list[bytes]) -> bytes:
    batch = b''.join(messages)
    compressed = lz4.frame.compress(batch)
    return compressed

# Decompression (in cloud)
def decompress_telemetry_batch(compressed: bytes) -> bytes:
    decompressed = lz4.frame.decompress(compressed)
    return decompressed

LZ4 Frame Format

LZ4 frame format adds important metadata:

[Magic Number: 4 bytes]
[Frame Descriptor: flags, block size, content size]
[Data Blocks: compressed chunks]
[End Mark: 4 bytes]
[Optional: Content Checksum]

This provides:

Format identification (magic number detection)
Integrity verification (checksums)
Independent block decompression (parallel processing)
Streaming support (process before entire frame arrives)

When NOT to Use LZ4 on Edge

Don't use LZ4 directly on MCUs when:

CPU runs below 100 MHz
RAM is under 64 KB
Real-time guarantees are critical
Single messages are being compressed (poor ratio)

Use LZ4 at gateway layer when:

Batching multiple messages
CPU has cycles to spare
Network bandwidth is constrained
Cloud ingestion costs are significant

The Complete Pipeline: Layered Compression Architecture

A production-ready, fault-tolerant telemetry pipeline combines all techniques in a carefully ordered sequence:

Why This Ordering Matters

The layer sequence is not arbitrary — each stage prepares data optimally for the next:

MessagePack first:

Eliminates ASCII overhead before numeric processing
Creates binary structure that delta/varint can work with
Preserves type information for correct reconstruction

Delta after MessagePack:

Operates on already-compact binary representations
Reduces numeric magnitude, making varint more effective
Works on individual numeric fields within structures

Varint after Delta:

Compresses the now-small delta values efficiently
Maintains byte-stream compatibility for transport
Keeps MCU CPU cost minimal

LZ4 last (at gateway):

Exploits repetition across entire batched messages
Handles structural patterns MessagePack leaves
Only runs where CPU and memory are abundant

Effective Compression Ratios

Measured on typical IoT telemetry (temperature, humidity, battery, timestamps, state flags):

Stage	Technique	Typical Gain	Rationale
Baseline	JSON	Reference	Human-readable, inefficient
After MessagePack	Binary serialization	2×	Removes text overhead
After Delta	Difference encoding	2-3×	Most telemetry values change slowly
After Varint	Variable-length integers	1.5-2×	Small deltas compress to few bytes
After LZ4 (batched)	Dictionary compression	2-4×	Repeated structure across messages

Combined effective ratio: 12-48× vs raw JSON

Layered Benefits Beyond Size

This architecture provides more than compression:

Fault isolation — Corruption in one layer doesn't cascade
Incremental optimization — Each layer can be tuned independently
Testability — Clear interfaces between stages
Debuggability — Can decode up to any stage for inspection
Flexibility — Layers can be selectively disabled for debugging
Predictability — Each stage has known worst-case behavior

Failure Modes & Recovery Design

Compression adds complexity. Without careful design, a single corrupt byte can break reconstruction of entire data streams.

The Problem: Cascading Failures

Consider what happens if a bit flips in transit:

Without block boundaries:

[MessagePack][Delta][Varint] → [MessagePack][Delta][Varint] → [MessagePack][Delta][Varint]
                                        ↑ corruption here
                                        ↓
                    All subsequent messages become undecodable

The corruption propagates because:

Delta encoding maintains state (last value)
Varint parsing depends on continuation bits
MessagePack structure assumes valid type tags

Solution: Block-Based Architecture

Divide the stream into self-contained, independently decodable blocks:

[Block Header][Payload][Checksum] | [Block Header][Payload][Checksum] | ...
                  ↑ corruption here
                  ↓
            Only this block is lost — next block decodes normally

Block Header Design

Each telemetry block should carry metadata that enables recovery:

typedef struct {
    uint32_t magic;          // Format identifier (e.g., 0xABCD1234)
    uint8_t  version;        // Compression pipeline version
    uint8_t  flags;          // Compression flags (which layers active)
    uint16_t payload_length; // Compressed payload size in bytes
    uint32_t sequence_num;   // Monotonically increasing counter
    uint32_t timestamp_ms;   // Unix epoch milliseconds
    uint32_t crc32;          // CRC-32 of payload
} TelemetryBlockHeader;

What Each Field Enables

Magic number:

Detects block boundaries in byte streams
Allows resynchronization after corruption
Guards against format confusion

Version field:

Handles pipeline evolution (algorithm changes)
Allows gradual rollout of new compression schemes
Enables backward compatibility in cloud decoder

Flags:

Indicates which compression layers are active
Supports debugging (selectively disable stages)
Allows adaptive compression based on device capabilities

Payload length:

Enables skipping corrupted blocks
Supports parallel processing in cloud
Validates buffer boundaries

Sequence number:

Detects missing blocks (gaps in sequence)
Enables out-of-order detection
Supports retransmission requests

Timestamp:

Provides temporal ordering independent of sequence
Enables time-based deduplication
Supports interpolation across gaps

CRC32:

Detects corruption with high probability
Fast to compute on both MCU and cloud
Standard, well-tested algorithm

Recovery Pipeline Logic (Cloud Side)

def process_telemetry_stream(stream: bytes) -> list[Record]:
    records = []
    offset = 0
    expected_sequence = None
    
    while offset < len(stream):
        # Scan for magic number if not aligned
        if not at_block_boundary(stream, offset):
            offset = find_next_magic(stream, offset)
            if offset == -1:
                break
        
        header = parse_block_header(stream[offset:offset+24])
        offset += 24
        
        payload = stream[offset:offset+header.payload_length]
        offset += header.payload_length
        
        # Validate CRC
        computed_crc = crc32(payload)
        if computed_crc != header.crc32:
            log_error(f"CRC mismatch in block {header.sequence_num}")
            continue  # Skip corrupted block
        
        # Detect missing blocks
        if expected_sequence is not None:
            if header.sequence_num != expected_sequence:
                log_warning(f"Gap detected: {expected_sequence} to {header.sequence_num-1}")
        expected_sequence = header.sequence_num + 1
        
        # Decompress based on version and flags
        try:
            if header.flags & FLAG_LZ4:
                payload = lz4_decompress(payload)
            
            varint_values = decode_varint_stream(payload)
            absolute_values = reconstruct_deltas(varint_values)
            record = msgpack.unpackb(absolute_values)
            records.append(record)
            
        except Exception as e:
            log_error(f"Failed to decode block {header.sequence_num}: {e}")
            continue
    
    return records

Delta State Management Across Blocks

Problem: Delta encoding maintains state (last value). If a block is lost, how do we reconstruct subsequent deltas?

Solutions:

Option A: Periodic baseline resets

// MCU side: Send absolute value every N samples
if (sample_count % BASELINE_INTERVAL == 0) {
    encode_absolute(value);  // Full value, not delta
    reset_delta_state();
} else {
    encode_delta(value);
}

Option B: Include baseline in block header

typedef struct {
    // ... standard fields ...
    float temp_baseline;      // Last known absolute value
    uint16_t battery_baseline;
} TelemetryBlockHeader;

Option C: Request retransmission

# Cloud side: Detect gap and request missing data
if detected_sequence_gap(current_seq, last_seq):
    request_retransmit(device_id, last_seq+1, current_seq-1)
    buffer_current_block()  # Hold until gap filled

Recommended approach: Combine A and B

Periodic baselines limit error propagation
Header baselines enable immediate recovery
Small overhead per block

Design Principle

Error isolation > maximum compression ratio.

A pipeline that achieves 50× compression but fails catastrophically on single-bit errors is worse than one achieving 30× compression with graceful degradation.

Key guidelines:

Keep blocks small (sub-1KB typically)
Always include integrity checks (CRC minimum)
Design for missing data, not just corrupted data
Test recovery logic as thoroughly as compression logic
Monitor block loss rates in production

Why This Matters at Hoomanely

Hoomanely builds systems where reliability matters more than raw throughput.

Our devices operate in challenging environments:

Intermittent connectivity - Wi-Fi signal varies
Power constraints - battery-operated, must last weeks
Real-time requirements - session mapping can't be delayed
Diverse deployment - homes, clinics, shelters with different networks
Fleet scale - thousands of devices generating continuous telemetry

Telemetry Use Cases

Device health monitoring:

MCU temperature, voltage rails, current draw
Flash wear levels, RAM usage patterns
Wireless signal strength, packet loss rates

Behavioral analytics:

Feeding patterns (when, how much, how fast)
Activity levels (motion sensor data)
Environmental context (ambient temperature, humidity)

Predictive maintenance:

Battery discharge curves (capacity degradation)
Component failure precursors

Operational intelligence:

Fleet-wide firmware version distribution
Feature usage patterns
Performance benchmarking across hardware revisions

All telemetry must be:

Transmitted efficiently (minimize bandwidth costs)
Stored economically (cloud storage at scale)
Queryable quickly (real-time dashboards)
Recoverable reliably (no silent data loss)

Impact of Compression Strategy

A well-engineered compression pipeline provides:

Economic benefits:

Reduced cloud storage costs
Lower network egress fees
Decreased cellular data usage
Smaller infrastructure footprint

Technical benefits:

Improved battery life (less radio-on time)
Reduced flash wear (fewer write cycles)
Better real-time performance (less CPU spent on I/O)
Enhanced debuggability (structured, recoverable data)

Operational benefits:

Faster analytics queries
Smoother dashboards
Easier troubleshooting
Confident scaling

Compression strategy reinforces core engineering values:

Efficiency - Do more with less (power, bandwidth, storage)
Introspectability - Preserve ability to debug and understand system behavior
Resilience - Gracefully handle failure modes without cascading issues
Predictability - Known performance characteristics under all conditions
Scalability - Linear cost growth as fleet expands

Thoughtful telemetry compression is a fundamental enabler of reliable, maintainable, cost-effective IoT systems at scale.

Key Takeaways

MessagePack replaces JSON with compact binary serialization apply it first to eliminate text encoding overhead and preserve type information.

Delta encoding shrinks numeric variation ideal for time-series telemetry where consecutive values differ slightly.

Varint compresses integers efficiently variable-length encoding optimized for small values without heavy CPU requirements.

LZ4 provides dictionary-based compression best deployed at gateways where CPU and memory are available.

Layered compression outperforms single techniques each stage prepares data optimally for the next.

Recovery-friendly block design prevents cascading failures self-contained blocks with headers, checksums, and sequence numbers enable graceful degradation.

Design for failure isolation, not maximum compression slightly lower ratio with robust error handling beats maximum compression that fails catastrophically.

Apply techniques at appropriate system tiers MCUs handle MessagePack/delta/varint, gateways add LZ4 batching, cloud focuses on efficient decompression.

Introduction

Why Telemetry Compression Matters

Delta Encoding - The First Layer

Why It Works

Key Properties

Implementation

Critical Design Consideration

Varint Encoding — Squeezing Numbers Further

Why Varint is Ideal for Embedded Devices

Encoding Mechanism

Implementation

Compression Effectiveness

MessagePack - Binary Serialization with Built-In Efficiency

Why MessagePack Matters for IoT

Key Advantages Over JSON

MessagePack Type Encoding Efficiency

Format Structure

Practical Example (Python)

Embedded C Example

Where MessagePack Fits

MessagePack vs Protobuf Trade-offs

LZ4 — Dictionary-Based Compression for Gateways

Why LZ4 for Telemetry Gateways

How LZ4 Works

Why LZ4 Shines on Pre-Processed Telemetry

Implementation Example (Python)

LZ4 Frame Format

When NOT to Use LZ4 on Edge

The Complete Pipeline: Layered Compression Architecture

Why This Ordering Matters

Effective Compression Ratios

Layered Benefits Beyond Size

Failure Modes & Recovery Design

The Problem: Cascading Failures

Block Header Design

What Each Field Enables

Recovery Pipeline Logic (Cloud Side)

Delta State Management Across Blocks

Design Principle

Why This Matters at Hoomanely

Telemetry Use Cases

Impact of Compression Strategy

Key Takeaways

Read more

Mounting PCBs in Curved Enclosures: Engineering for Stress, Serviceability, and Longevity

Visualizing the Unknown: Anomaly Detection in Unlabeled Audio using UMAP

Bounded Intelligence: Operating AI Systems That Remain Stable, Predictable, and Trustworthy

Designing Concurrency-Safe AI Pipelines for Stateful Systems