Image Size Budgets in Embedded Systems: Pruning, Stripping & Compression for Efficient Edge Imaging
In modern imaging pipelines especially those running on edge devices every byte matters.
When an embedded system captures images, those images must travel through limited memory, constrained communication buses, and strict power budgets. Large image payloads degrade system performance, slow down recovery pipelines, and create unpredictable behavior.
To keep imaging fast and predictable, we define image size budgets.
Image size budgeting is the practice of controlling how large an image is allowed to be across its lifecycle from capture to transmission to storage. At Hoomanely, while building our AI-powered smart pet bowl with an camera sensor, we learned this lesson the hard way.
Our initial implementation captured maximum resolution images and attempted to transmit them to an edge device for ML inference. The result? Multi-second transmission times.
Achieving efficient imaging requires three complementary strategies applied sequentially:
- Pruning – minimize what you capture
- Stripping – remove what you don't need
- Compression – encode what remains efficiently
This post explains these techniques, when to apply them, and the architectural thinking behind building predictable imaging systems.
The Problem: The Hidden Cost of Uncontrolled Images
In resource-constrained systems, raw unoptimized images lead to:
- Longer transfer and sync times: Minutes instead of milliseconds
- Unpredictable memory consumption: Exhausting available storage unpredictably
- Reduced throughput in recovery pipelines: Bus congestion blocks critical sensor data
- Increased latency on transmission pathways: Delays prevent real-time inference
The most effective architectures follow this sequence:

Each stage reduces payload before the next process touches it, enabling predictable latency, controlled memory usage, and efficient recovery pipelines.
1. Pruning - Reduce What You Capture
Pruning removes unnecessary image data before encoding or transmission.
Instead of capturing, storing, or transmitting the entire image frame, pruning forces the system to define what portion is actually useful for the task at hand.
Why Pruning Comes First
The camera sensor can capture at maximum resolution, but not all tasks require that level of detail. Different contexts need different amounts of visual information:
- Some tasks benefit from richer detail and color information
- Others work effectively with reduced resolution
- Certain analyses only need grayscale data
- Many scenarios require just a portion of the frame
Typical Pruning Operations
Resolution reduction: Configure the sensor to capture only what's needed. Maximum resolution looks impressive but wastes transmission bandwidth.
Region of Interest (ROI) cropping: Our bowl occupies a fixed position in the frame. Why transmit the surrounding environment? Crop to just the bowl area.
Color space conversion: RGB provides full color information, but water level detection works perfectly fine with grayscale, reducing data by one-third.
Channel elimination: Remove alpha channels or unused color components that don't contribute to inference accuracy.
The Key Principle
The cheapest byte is the byte never generated.
Pruning happens at capture time, where the cost of moving extra pixels is highest. Every pixel that doesn't get captured doesn't need to be stored, transmitted, or processed downstream.

2. Stripping - Remove What You Don't Need
After pruning reduces pixel count, the next step is eliminating non-pixel overhead.
Digital images carry hidden baggage that bloats file sizes without contributing to image quality or ML inference accuracy.
What Gets Stripped
EXIF metadata: Camera make and model, capture settings, timestamps, GPS coordinates, shutter speed, ISO values. Useful for photography, irrelevant for embedded vision systems.
Color profiles: ICC calibration data that ensures color accuracy across different displays. Edge devices performing inference don't render images for human viewing.
Sensor calibration data: The AR0144 embeds color correction matrices and lens shading correction data in captured frames. This helps image processing but isn't needed after initial processing.
Debug information: Thumbnails, sensor register dumps, frame counters. Valuable during development, wasteful in production.
The Critical Difference
| Aspect | Pruning | Stripping |
|---|---|---|
| Removes pixels? | Yes | No |
| Removes metadata? | No | Yes |
| Changes visual quality? | Yes, by design | No |
| When applied? | At capture | After capture |
Stripping preserves every pixel while eliminating ancillary data. The image looks identical but occupies less space.
Real-World Impact
Metadata overhead varies by sensor and configuration, but typically adds several kilobytes per frame. Over hundreds of daily captures, this compounds into megabytes of wasted storage and transmission bandwidth.
More importantly, stripping reduces transmission time. On bandwidth-constrained buses, every kilobyte saved frees capacity for other critical sensor data - weight sensors, temperature probes, proximity detectors.
Stripping reduces clutter, not clarity.
3. Compression - Make Remaining Data Efficient
After pruning pixels and stripping metadata, compression reduces the size of what remains.
Compression is where format selection matters — JPEG, PNG, WebP — but the fundamental rule is:
Compression should be applied only after pruning and stripping.
Compressing a full-resolution image with metadata is wasteful. Compress the lean, optimized payload instead.
Compression Types
Lossless compression: Preserves every bit of information. Essential for debug builds, reproducibility, and exact comparisons. Larger file sizes but perfect reconstruction.
Lossy compression: Discards perceptually insignificant information to achieve dramatic size reduction. Ideal for transmission, bandwidth-constrained channels, and scenarios where perfect reconstruction isn't required.
Context-Aware Compression
Different use cases demand different compression strategies. The key is matching compression parameters to the task. Quality settings should be tuned dynamically based on available bandwidth, urgency, and inference requirements.

System Architecture Perspective
A predictable imaging pipeline follows this structure:
[ Image Capture ]
↓
[ Pruning Layer ]
(Resolution, ROI, Color Space)
↓
[ Stripping Layer ]
(Metadata Removal)
↓
[ Compression Layer ]
(Format Encoding)
↓
[ Transmission ]
(Bus, Network, Storage)Each stage reduces payload size before the next process handles it. This architecture enables:
- Predictable latency: Known maximum transmission times
- Controlled memory usage: Bounded buffer requirements
- Efficient recovery pipelines: Smaller logs, faster sync
- Power optimization: Shorter transmission windows
- Bus efficiency: More bandwidth for other sensors
The Compounding Effect
Optimizations compound across stages. If pruning achieves a reduction factor, and stripping adds another reduction, and compression adds yet another — the final payload can be orders of magnitude smaller than the raw capture.
This isn't about clever tricks. It's about intentional engineering decisions made at each pipeline stage.
When to Use Which Technique
Use Pruning When:
- You don't need the entire scene captured
- Smaller dimensions are acceptable for your ML model
- Different contexts require different resolutions
- Real-time performance is critical
Use Stripping When:
- Pixel integrity must remain completely untouched
- Metadata offers no operational value to downstream systems
- You need guaranteed visual quality with smaller files
- Storage efficiency matters over long deployments
Use Compression When:
- Format compatibility is required (JPEG for web, etc.)
- Network bandwidth is the primary constraint
- You need a balance between quality and size
- Transmission time directly impacts user experience
Combining All Three:
The most effective pipelines use all three techniques sequentially. Prune first to eliminate unnecessary pixels. Strip second to remove overhead. Compress last to efficiently encode what remains.
Image optimization isn't about "shrinking images" - it's about preserving value while eliminating waste.
Results and Real-World Impact
After implementing this three-stage pipeline in Hoomanely's pet bowl system:
Performance Improvements
- Transmission time: Reduced from several seconds to under 200 milliseconds
- Bus utilization: Dropped from congested to healthy levels, freeing bandwidth
- Daily storage requirements: Decreased from hundreds of megabytes to single-digit megabytes
- Battery life: Nearly doubled due to shorter transmission windows
- Inference latency: Enabled true real-time pet monitoring
System Reliability
- Multi-pet households: Simultaneous bowl transmissions no longer cause collisions
- Real-time alerts: Pet owners receive feeding notifications instantly
- Consistent performance: Predictable latency regardless of network conditions
- Extended operation: All-day monitoring without recharging
Model Accuracy
Despite aggressive optimizations, ML model accuracy remained high. Minor accuracy losses were acceptable trade-offs for dramatically improved user experience and system performance.
The key insight: Most computer vision models don't need maximum resolution to perform well. Finding the minimum acceptable quality unlocks massive efficiency gains.
Efficient imaging pipelines allow our products to be faster, predictable, and smoother for end users. This is why exploring structured image size budgeting strategies - pruning, stripping, and compression becomes critical for any intelligent hardware system that captures visual data.
Conclusion
Image size budgets transform imaging systems from unpredictable to reliable. By applying pruning, stripping, and compression sequentially, we can build pipelines that respect memory constraints, transmission limitations, and power budgets while delivering high-quality results.
The lesson from building Hoomanely's pet monitoring system: constraints breed creativity. Limited bandwidth forced us to question every assumption. Limited power forced us to optimize every stage. Limited memory forced us to define what actually matters.
The result is a system that works reliably in real homes, monitors pets effectively, and provides peace of mind to owners all running on affordable, accessible hardware.
If you're building embedded vision systems, IoT devices, or edge ML applications: define your image size budget early. Embrace the constraints. Your users will thank you with every millisecond you save.