90 FPS, Zero Pixels: How a Compartment ID Silently Ate Every DCMIPP Write

The pipeline ran. Interrupts fired. DMA counters incremented. Every status register said the system was healthy. The buffers were empty. Three days of debugging, one register field.

DCMIPP pipeline diagram showing Pipe 2 compartment ID mismatch

The DCMIPP driver on the platform is not complicated to configure. The register map is documented, the reference manual explains the pipe architecture, and the Linux kernel driver has been in mainline long enough that most of the sharp edges are known. You set your source, pick a pipe, configure the crop and downscale, point the DMA at a buffer, and start the stream. The hardware does the rest.

That was the theory.

In practice, we spent three days chasing 90 FPS that produced zero valid frames. The pipeline ran. Interrupts fired. DMA counters incremented. Every status register said the system was healthy. The buffers were empty.

The root cause was a single field in a single register: the compartment ID. This post is about that field, how it works, why getting it wrong produces exactly the failure we saw, and the broader lesson about hardware bugs that look like software bugs.

What the DCMIPP Actually Is

The Digital Camera Memory Interface Pixel Pipeline, DCMIPP, is the camera capture subsystem on this SoC family. It sits between an incoming CSI-2 or parallel camera interface and system memory. Its job is to receive raw pixel data, optionally crop and rescale it, and write the result into DMA buffers that the application can read.

The hardware provides three pipes. Pipe 0 is the dump pipe, a raw passthrough with no processing. Pipe 1 and Pipe 2 are the processing pipes with crop, downscale, and format conversion capabilities. Each pipe has its own DMA engine, its own interrupt lines, and its own set of configuration registers.

For our use case, we were running a 90 FPS camera at full resolution on one pipe while feeding a scaled-down stream to a second pipe for inference. The hardware supports this. The driver supports this. Nothing about the setup was unusual.

DCMIPP three-pipe architecture showing compartment ID mismatch between Pipe 1 and Pipe 2
DCMIPP pipe architecture on the SoC. Pipe 1's DMA held a leftover non-zero CID from a previous configuration attempt, putting it in Compartment 1 where the V4L2 buffers lived. Pipe 2 initialized fresh with the register reset value of 0x00, directing its DMA writes to Compartment 0 where no buffers existed.

The Setup That Should Have Worked

The camera was a CSI-2 sensor connected over MIPI CSI-2. The SoC was running a mainline kernel with the standard dcmipp driver. We had configured both pipes, registered V4L2 devices for each, and were consuming frames from both through standard V4L2 MMAP buffers.

Pipe 1 was producing frames correctly. 90 FPS, correct resolution, correct format. Every capture from that pipe worked exactly as expected.

Pipe 2 produced nothing. Zero frames. Not corrupted frames, not frames with wrong dimensions, not partial frames. Zero.

The V4L2 DQBUF calls would block indefinitely. The file descriptor was valid. The stream was started. The queue had buffers. They just never came back filled.

What the Debug Showed

The first instinct in this situation is to look at the obvious things. Wrong pixel format. Buffer size mismatch. QBUF called wrong. Stream not started. We checked all of them. Everything looked correct.

We added printk instrumentation to the driver. The pipe 2 interrupt handler was being called at 90 Hz. The interrupt was firing. The DMA completion callback was executing. The driver was calling vb2_buffer_done on every frame.

But userspace never received the frames.

That particular combination, interrupts firing but userspace not receiving, is a pattern worth recognizing. It usually means one of two things: the driver is marking buffers done with an error state, or the buffer the DMA is writing to is not the buffer userspace queued. We looked at the error state first and found nothing. We started looking at buffer address mismatches.

The Compartment ID Register

The platform memory subsystem uses a compartment-based access control model. Physical memory regions can be assigned to compartments, and hardware masters, the CPU, DMA engines, peripheral DMAs, each have a compartment ID that determines which memory regions they are permitted to access.

The DCMIPP has a register for this. Each pipe's DMA engine has a compartment ID field that tells the memory interconnect which compartment this DMA access belongs to. The kernel driver sets this field during pipe initialization.

The default value of this field in the reference manual is 0. The reset value is 0. The driver was setting it to 0. We had never touched it.

DCMIPP P2CMIER register field diagram showing CID bits set to 0 at reset
The DCMIPP_P2CMIER register showing the CID field at bits [3:0]. The reset value is 0x0, which routes all DMA writes from Pipe 2 to Compartment 0. On the SoC with application processor memory allocation, Compartment 1 is the correct target. The field is one value away from working; nothing in the driver's default path corrects it.

The problem was that 0 is not a valid compartment ID on our platform. The memory regions we had allocated for the V4L2 buffers were in a compartment that required a non-zero ID. The DMA was writing to addresses that resolved to a different memory region entirely, or in some configurations, to an address that was silently dropped by the interconnect.

Why Pipe 1 Worked and Pipe 2 Did Not

This is the detail that made the bug genuinely confusing. If the compartment ID was wrong for Pipe 2, why was Pipe 1 fine?

The answer was in the driver initialization order. Pipe 1 had its DMA configured and its compartment ID set by a slightly different code path during probe. A previous attempt to configure the pipe for a different resolution had left a non-zero value in the compartment register. That value happened to be correct for our platform's memory layout.

Pipe 2 was configured fresh. No prior state. The register held its reset value of 0. The DMA wrote at 90 FPS to wherever compartment 0 mapped to. None of those writes reached the V4L2 buffers.

The interrupt fired because the DMA completed its write. From the hardware's perspective, the write succeeded. The data went somewhere. That somewhere just was not the buffer we had queued.

The Fix

Once we understood the problem, the fix was straightforward. The compartment ID for the pipe 2 DMA needed to be set to match the memory compartment in which the V4L2 buffers were allocated.

On the SoC, the correct value for CID-aware DMA in the application processor context is 1. The register write is a single line:

DCMIPP_CMIER_P2CIDC field in DCMIPP_P2CMIER, set to 1.

After that change, Pipe 2 started delivering frames immediately. Same hardware, same driver, same camera, same buffers. One register field changed from 0 to 1. The system went from zero frames to 90 FPS.

The Class of Bug This Belongs To

This is not a driver bug in the traditional sense. The driver was not wrong about what it was doing. It was setting a register to its documented reset value. The documentation does not prominently flag that this reset value is invalid in a compartment-aware system configuration. There is no error log from the hardware. There is no fault or exception. The DMA simply writes to an address that is accepted by the interconnect and goes nowhere useful.

It is a configuration correctness bug. The driver was correct in isolation. It was incorrect in the context of a platform with specific memory compartment requirements.

This class of bug shares a common structure with the bugs described elsewhere on this blog. A Python-to-C++ port that compiles and runs but produces wrong answers because library defaults differ across language bindings. A factory bring-up flow that passes all tests but fails in the field because a register's reset value assumed a platform configuration that does not exist. A firmware feature that works in the lab but corrupts shared state at 3 AM because nobody designed a boundary.

In every case, the bug does not announce itself. No crash. No exception. No log line. The system runs, counts as healthy by every metric you think to check, and silently produces the wrong outcome.

How to Find This Class of Bug

The method that works is systematic state verification, not intuition-driven debugging.

For DMA-related failures where interrupts fire but userspace receives nothing, the first question should be: is the DMA writing to the address I think it is writing to? Not the address I configured. Not the address the driver calculated. The physical address the DMA controller actually used for the most recent transfer.

On the platform, this is readable. The DMA current address registers update after each transfer. If those addresses do not match the physical addresses of your V4L2 buffers, the DMA is writing somewhere else and the reason is upstream of the DMA itself.

From there, the question becomes: what controls where those writes go? On a platform with memory compartmentalization, the compartment ID is part of that answer. That register field should be in your checklist the moment you see a DMA that fires but produces no output.

Register Fields That Hide in Plain Sight

A reset value of 0 is interpreted by most engineers as a safe default. Zero usually means disabled, or unconfigured, or identity. In memory access control hardware, 0 often means the first compartment, which may or may not be the one your allocation lives in.

This is not unique to the platform. Any SoC with a bus-level access control mechanism, TrustZone, SMMU, Cortex-A CID awareness, or a proprietary interconnect with region-based protection, has register fields where the reset value is technically valid but practically incorrect in a real software environment.

The reference manual usually describes these fields accurately. The description is often brief. It may be in a chapter you read once during initial bring-up and did not revisit when debugging a failure that looked unrelated.

The fix is to treat these fields the same way the factory bring-up philosophy treats board identity: do not assume the reset value is correct. Verify explicitly. Set explicitly. Add a comment explaining why the value is what it is and what breaks if it is wrong.

What We Changed in Our Driver Initialization

After resolving the immediate issue, we audited the DCMIPP initialization sequence for every platform configuration we support. Each pipe's CID register is now set explicitly to the correct value for the target platform's memory map, with a comment referencing the memory compartment assignment in our platform BSP.

We also added a diagnostic check during stream start. If the pipe's DMA current address register does not fall within the expected physical address range of the allocated buffers after the first few frames, the driver logs a warning and reports the configured CID value. This does not fix a wrong CID automatically, but it makes the failure visible immediately rather than after three days of debugging.

The check costs almost nothing at runtime. It fires once per stream start and then stays quiet. The diagnostic surface it provides is worth the few lines of code.

The Debugging Principle That Would Have Saved Three Days

Work backwards from the DMA address, not forward from the configuration.

When a DMA-driven capture pipeline fires interrupts but delivers no data, the standard path is to re-read every configuration register you touched and compare it against the reference manual. That path takes a long time and often produces nothing because you read the registers correctly the first time.

The faster path is to read the registers you did not touch. The ones that have reset values. The ones whose documentation is one paragraph long. The ones that control something orthogonal to what you think you are debugging.

On the platform DCMIPP, the compartment ID register is in that category. It is not in the main configuration sequence. It is not something most bring-up guides mention. It does not appear in the typical checklist for camera pipeline debugging. It defaults to zero, which is a valid value, so it does not look wrong.

But it was wrong. One field. Three days.

Hoomanely's View on Silent Hardware Failures

At Hoomanely, our hardware runs inference pipelines at the edge. Frame data feeds directly into models. A camera pipeline that delivers zero frames at full reported throughput is not a theoretical concern. It is a production risk.

We have learned to treat these silent failures as a category unto themselves, separate from the bugs that crash, the bugs that log errors, and the bugs that fail visible assertions. Silent failures require a different debugging posture. You cannot wait for the system to tell you something is wrong. You have to verify independently that the output is what you expect, even when all the intermediate signals look healthy.

That means checking physical buffer contents, not just buffer states. It means comparing DMA destination addresses against expected physical addresses. It means treating register reset values as candidates for review rather than safe assumptions.

The compartment ID taught us that. One field, never touched, defaulting to zero, silently consuming every frame the camera produced for three days.

The hardware was doing exactly what it was told. We just did not realize what we had told it.

Read more