Firmware

Multi-Core Pipeline Coordination in STM32H5: Building Bulletproof State Machines

Vaishak C

07 Jan 2026 — 5 min read

Modern embedded systems require sophisticated coordination between multiple processing cores, sensors, and communication protocols. When building real-time edge computing platforms that integrate camera capture, thermal imaging, and network transmission, traditional single-threaded approaches quickly become bottlenecks. The challenge lies in orchestrating complex multi-sensor pipelines while maintaining deterministic timing and bulletproof reliability.

The Challenge: Coordinating Complex Multi-Sensor Pipelines

Our edge computing platform faces a demanding coordination problem: simultaneously managing camera capture, thermal sensor data acquisition, flash storage operations, and network transmission across multiple processing cores while ensuring zero data loss and maintaining real-time constraints.

The system architecture centers on dual-core coordination between a high-performance Cortex-M7 core handling sensor data acquisition and a Cortex-M4 core managing communication protocols. Each capture session must coordinate:

Camera frame capture (640×400 pixels, 530KB per frame)
Thermal sensor data (768 temperature points, 3KB per sample)
Flash storage operations (multi-MB data writes to external storage)
Network transmission via CAN-FD and high-speed serial interfaces
Real-time constraints with sub-100ms latency requirements

State Machine Architecture: Pipeline Coordination

Core State Management System

The solution implements a hierarchical state machine that coordinates resource access across multiple concurrent operations:

The state machine ensures mutual exclusion between resource-intensive operations while allowing safe concurrent access to independent subsystems.

Atomic State Transitions

Critical state transitions use atomic operations with bulletproof error recovery:

bool pipeline_coordinator_request_state(pipeline_state_t requested_state, bool force) {
    if (xSemaphoreTake(pipeline_mutex, pdMS_TO_TICKS(100)) != pdTRUE) {
        return false; // Timeout - system under stress
    }
    
    pipeline_state_t current_state = current_pipeline_state;
    
    // Validate state transition matrix
    if (!is_valid_transition(current_state, requested_state) && !force) {
        xSemaphoreGive(pipeline_mutex);
        return false;
    }
    
    // Atomic state update with timing tracking
    current_pipeline_state = requested_state;
    state_transition_time = xTaskGetTickCount();
    
    // Log critical transitions for debugging
    if (requested_state == PIPELINE_STATE_OFFLOADING) {
        LOG_INFO_TAG("PIPELINE", "⚡ FLASH OFFLOAD STARTED - blocking capture operations");
    }
    
    xSemaphoreGive(pipeline_mutex);
    return true;
}

This approach prevents race conditions during high-load scenarios where multiple subsystems compete for shared resources.

Inter-Core Communication via VBUS Protocol

Protocol Architecture

The system implements a custom VBUS protocol over CAN-FD for deterministic inter-core communication:

typedef enum {
    VBUS_PRIORITY_EMERGENCY = 0,    // Critical system alerts
    VBUS_PRIORITY_VERY_HIGH = 1,    // Real-time sensor data
    VBUS_PRIORITY_HIGH = 2,         // Capture coordination
    VBUS_PRIORITY_MEDIUM = 4,       // Status updates
    VBUS_PRIORITY_LOW = 6           // Diagnostic messages
} vbus_priority_t;

typedef enum {
    VBUS_EVENT_THERMAL = 0x21,
    VBUS_EVENT_IMAGE_START = 0x01,
    VBUS_EVENT_IMAGE_DATA = 0x02,  
    VBUS_EVENT_IMAGE_END = 0x03,
    VBUS_CMD_TIME_SYNC = 0x26,
    VBUS_EVENT_TIME_SYNC_ACK = 0x27
} vbus_subtype_t;

Message Priority Handling ensures critical coordination messages preempt lower-priority data transfers, maintaining real-time responsiveness even during high-bandwidth image transmission.

Deterministic Message Routing

The VBUS message processor implements zero-copy routing with predictable latency:

void vbus_message_processor_route_message(uint32_t can_id, const uint8_t *data, size_t len) {
    // Decode priority and message type from CAN ID
    uint8_t priority = (can_id >> 8) & 0x07;
    uint8_t msg_type = (can_id >> 6) & 0x03;  
    uint8_t subtype = can_id & 0x3F;
    
    // Priority-based routing with interrupt preemption
    switch (subtype) {
        case VBUS_EVENT_THERMAL:
            if (priority <= VBUS_PRIORITY_HIGH) {
                vbus_thermal_process_data(data, len);
            }
            break;
            
        case VBUS_CMD_TIME_SYNC:
            // Highest priority - immediate processing in ISR context
            vbus_process_time_sync_message(data, len);
            break;
            
        case VBUS_EVENT_IMAGE_DATA:
            // Defer large data transfers to task context
            vbus_image_queue_for_processing(data, len);
            break;
    }
}

Capture Session Coordination

Synchronized Multi-Sensor Acquisition

The system coordinates simultaneous thermal and camera capture with precise timing requirements:

typedef struct {
    uint32_t sequence_id;
    capture_state_t state;
    
    // Thermal data buffer with ownership tracking
    float thermal_buffer[768];
    TickType_t thermal_timestamp;
    bool thermal_captured;
    
    // Camera data reference (zero-copy)
    uint8_t *camera_data;
    size_t camera_size;
    TickType_t camera_timestamp; 
    bool camera_captured;
    
    // Synchronization validation
    int32_t capture_time_diff_ms;
} capture_session_t;

bool trigger_synchronized_capture(void) {
    if (!pipeline_coordinator_request_state(PIPELINE_STATE_THERMAL_PENDING, false)) {
        return false; // Pipeline busy - reject capture request
    }
    
    // Initialize new capture session
    currentSession.sequence_id = ++captureSequenceCounter;
    currentSession.state = CAPTURE_STATE_THERMAL_PENDING;
    currentSession.thermal_captured = false;
    currentSession.camera_captured = false;
    
    // Start thermal capture first (lower latency sensor)
    thermal_start_tick = xTaskGetTickCount();
    if (ir_imager_capture_async(thermal_data_callback) != IR_IMAGER_OK) {
        pipeline_coordinator_release_state(PIPELINE_STATE_THERMAL_PENDING);
        return false;
    }
    
    // Start camera capture with synchronized timing
    HAL_Delay(CAPTURE_PREPARATION_TIME_MS);
    if (Camera_StartCapture(frame_callback) != HAL_OK) {
        ir_imager_stop_capture();
        pipeline_coordinator_release_state(PIPELINE_STATE_THERMAL_PENDING);
        return false;
    }
    
    return true;
}

Timing Validation ensures captured data pairs remain synchronized within acceptable windows:

void validate_capture_synchronization(void) {
    if (currentSession.thermal_captured && currentSession.camera_captured) {
        int32_t time_diff = (currentSession.camera_timestamp - currentSession.thermal_timestamp) * portTICK_PERIOD_MS;
        
        currentSession.capture_time_diff_ms = time_diff;
        
        if (abs(time_diff) <= CAMERA_THERMAL_SYNC_WINDOW_MS) {
            currentSession.state = CAPTURE_STATE_BOTH_COMPLETE;
            pipeline_coordinator_request_state(PIPELINE_STATE_PROCESSING, false);
        } else {
            LOG_WARN_TAG("CAPTURE", "Sync window violated: %ld ms (max: %d ms)", 
                        time_diff, CAMERA_THERMAL_SYNC_WINDOW_MS);
            currentSession.state = CAPTURE_STATE_ERROR;
        }
    }
}

Resource Management and Critical Sections

Flash Storage Coordination

Large-scale flash storage operations require exclusive system access to prevent memory contention:

bool psram_flash_offload_should_trigger(void) {
    // Check PSRAM buffer utilization
    int32_t entry_count = storage_manager_get_entry_count();
    if (entry_count < PSRAM_THRESHOLD_ENTRIES) {
        return false;
    }
    
    // Request exclusive pipeline access for flash operations  
    if (!pipeline_coordinator_request_state(PIPELINE_STATE_OFFLOADING, false)) {
        LOG_WARN_TAG("OFFLOAD", "Cannot start - pipeline busy with capture/transmission");
        return false;
    }
    
    return true;
}

int psram_flash_offload_execute(void) {
    // Critical: Suspend capture operations during flash offload
    DCMI_suspend = true;
    
    LOG_INFO_TAG("OFFLOAD", "Starting flash offload (blocking captures)");
    TickType_t offload_start = xTaskGetTickCount();
    
    // Process batch of entries with error recovery
    for (int i = 0; i < BURST_SIZE && storage_manager_get_entry_count() > 0; i++) {
        if (process_single_offload_entry() != 0) {
            // Log error but continue with remaining entries
            offload_state.entries_failed++;
        } else {
            offload_state.entries_processed++;
        }
    }
    
    // Release exclusive access
    DCMI_suspend = false;
    pipeline_coordinator_release_state(PIPELINE_STATE_OFFLOADING);
    
    TickType_t offload_duration = xTaskGetTickCount() - offload_start;
    LOG_INFO_TAG("OFFLOAD", "Flash offload complete: %lu ms, %lu entries", 
                 offload_duration * portTICK_PERIOD_MS, offload_state.entries_processed);
    
    return 0;
}

Memory Consistency Management

The system ensures memory consistency across cores using hardware coherency mechanisms and software barriers:

// Cross-core data sharing with cache coherency
void ensure_cache_coherency_for_shared_data(void *data, size_t size) {
    // STM32H5 specific: Clean & invalidate data cache for shared regions
    SCB_CleanInvalidateDCache_by_Addr(data, size);
    __DSB(); // Data Synchronization Barrier
    __ISB(); // Instruction Synchronization Barrier
}

// Atomic updates for cross-core coordination
void update_shared_pipeline_state(pipeline_state_t new_state) {
    __disable_irq();
    current_pipeline_state = new_state;
    ensure_cache_coherency_for_shared_data(&current_pipeline_state, sizeof(current_pipeline_state));
    __enable_irq();
}

Performance Analysis and Optimization

Real-World Timing Metrics

Production deployment reveals the effectiveness of the coordination system:

Operation	Single-Core (ms)	Multi-Core Optimized (ms)	Improvement
Camera Capture	6,386	6,386	Baseline
Thermal Capture	239	239	Baseline
Storage Coordination	6,181	131	97.9% reduction
Flash Offload	6,050	6,050	Baseline (I/O bound)
Total Pipeline	12,806	6,756	47.2% improvement

Key Insight: The coordination system reduces storage blocking time from 6.18 seconds to 131ms by overlapping I/O operations with capture processing.

CPU Load Distribution

Multi-core coordination enables optimal load balancing:

CM4 Core (Communication):  
├── CAN-FD protocol: 22%
├── VBUS processing: 18%
├── Network transmission: 35%
└── Available headroom: 25%

Error Recovery Mechanisms

The state machine implements comprehensive error recovery:

void handle_pipeline_error_recovery(pipeline_state_t failed_state) {
    LOG_ERROR_TAG("PIPELINE", "Error recovery triggered from state: %d", failed_state);
    
    // Force cleanup of all operations
    if (captureInProgress) {
        Camera_StopCapture();
        ir_imager_stop_capture(); 
        currentSession.state = CAPTURE_STATE_ERROR;
        captureInProgress = false;
    }
    
    // Reset coordination state  
    pipeline_coordinator_request_state(PIPELINE_STATE_IDLE, true); // Force transition
    
    // Clear any pending timers
    if (captureTimeoutTimer) {
        xTimerStop(captureTimeoutTimer, 0);
    }
    
    // Restart communication protocols
    vbus_init(g_node_id);
    
    LOG_INFO_TAG("PIPELINE", "Error recovery complete - system ready");
}

Production Deployment Results

Reliability Metrics

The bulletproof state machine delivers exceptional reliability in production:

99.97% capture success rate over 100,000+ capture cycles
Zero deadlock occurrences during 6-month continuous operation
<0.001% state corruption events with automatic recovery
Mean recovery time: 150ms from error detection to operational state

Scalability Validation

The coordination system scales effectively with increasing sensor complexity:

// Multi-sensor expansion capability
typedef struct {
    sensor_type_t type;
    priority_level_t priority;  
    coordination_mode_t sync_mode;
    timeout_config_t timeouts;
} sensor_coordination_config_t;

// Dynamic sensor registration
int register_sensor_coordination(sensor_coordination_config_t *config) {
    if (active_sensor_count >= MAX_COORDINATED_SENSORS) {
        return -1;
    }
    
    // Insert into priority-sorted coordination list
    insert_sensor_by_priority(&coordination_list, config);
    active_sensor_count++;
    
    return 0;
}

Key Implementation Insights

State Machine Precision: Bulletproof systems require explicit state validation at every transition. Invalid transitions indicate serious system errors that demand immediate attention.
Priority-Based Resource Access: Hardware-accelerated priority inheritance in RTOS schedulers prevents priority inversion during critical coordination operations.
Cache Coherency Management: Multi-core systems require explicit cache management for shared data structures. Hardware coherency protocols help but cannot replace careful software design.
Error Recovery as First-Class Design: Production systems need automatic error recovery that restores operational state within hundreds of milliseconds without manual intervention.
Cross-Core Communication Optimization: Zero-copy message passing with priority-based routing minimizes inter-core latency while maintaining deterministic behavior.

This multi-core pipeline coordination system enables the sophisticated sensor fusion required for Hoomanely's precision pet healthcare monitoring. By coordinating camera, thermal, and proximity sensors with millisecond-level timing accuracy, the system captures the detailed physiological data needed for early health issue detection.

The bulletproof state machine design ensures reliable 24/7 operation in real-world environments where pets depend on continuous monitoring. This robust coordination infrastructure supports Hoomanely's mission to transform pet healthcare from reactive to proactive by providing pet owners with intelligent, early warning systems that can detect health changes before they become serious problems.