Reading a Pet's Temperature Through a Thermal Overlay

Reading a Pet's Temperature Through a Thermal Overlay

A thermal camera can see a fever before a dog acts sick — but only if you know which warm pixels belong to the dog. A low-resolution thermal sensor and a normal camera look at the same scene from slightly different places, through different lenses, at different resolutions. Point them at a pet and you get two pictures that don't line up: the camera knows exactly where the animal's body is, and the thermal frame knows how hot things are, but neither alone can say "the dog's flank is 38.6 °C." Our monitoring station fuses them — it finds the pet in the sharp camera image, warps the fuzzy thermal frame onto it, and reads temperature at the right spot. The twist is that how you line them up depends on how close the pet is. This post is about that distance-aware registration.

The Problem: Two Cameras That Don't Agree

Fusing a camera and a thermal sensor sounds like a one-time calibration: measure the fixed geometric relationship between them once, bake in a transform, done. It isn't, because of parallax.

The two sensors sit a few millimeters apart on the board. Like your two eyes, they see a near object from noticeably different angles but a far object from almost the same angle. So the pixel shift needed to align thermal onto camera changes with the subject's distance — a correction that's perfect for a dog sniffing the bowl is wrong for a dog standing back from it.

Two more complications stack on top. The camera uses a wide-angle lens that bends straight lines (fisheye distortion) and must be undistorted first. And the thermal frame is tiny — a coarse grid of temperatures versus a multi-megapixel image — so every camera pixel maps to a fractional thermal coordinate. Get any of this wrong and you don't just misplace the overlay; you read the temperature of the floor next to the dog and call it a fever.

The Approach: Let Proximity Drive the Registration

The station has a third sensor that makes this tractable: a proximity sensor that reports the subject's distance in millimeters. Instead of a single fixed transform, we treat distance as the knob that tunes the fusion at the moment of capture.

The pipeline runs in stages. First, undistort the camera image with its calibration so geometry is trustworthy. Second, run pose segmentation to locate the pet's body and compute a centroid — the point we actually want a temperature for. Third, map that point into thermal space using the measured camera-to-thermal geometry, corrected for the current distance. Finally, sample the thermal frame around that point and report a temperature.

The distance-dependence shows up in three concrete places, and each is a small, explicit model rather than a magic constant. Distance sets how far the warm spot is expected to sit from the visual centroid; it sets how large a patch we average; and it rescales the parallax shift itself. Everything below is those three rules.

The Process: Three Distance-Aware Rules

Rule 1 — Distance sets how far to look for the heat. The hottest part of a body isn't always exactly under the visual centroid, and that offset grows when the pet is very close (more of its body fills the frame). We model the expected offset, in thermal pixels, as a base term plus a closeness term plus a body-size term:

65	    prox_clamped = np.clip(prox_mm, PROX_MM_MIN, PROX_MM_MAX)
66	
67	    span = max(PROX_MM_MAX - PROX_MM_MIN, 1e-6)
68	    proximity_norm = (prox_clamped - PROX_MM_MIN) / span  # 0 near threshold, 1 far
69	    closeness = 1.0 - np.clip(proximity_norm, 0.0, 1.0)    # 1 when very close, 0 when far
70	
71	    d_exp_th = (
72	        K0_OFFSET
73	        + K1_PROX_TERM * closeness
74	        + K2_AREA_TERM * np.sqrt(mask_area_rgb) * scale_rgb_to_th
75	    )

Note the unit discipline: the body-area term is converted from camera pixels into thermal pixels by scale_rgb_to_th, because all of this reasoning happens in the thermal frame's coordinate system.

Rule 2 — Distance sets how big a patch to average. Sampling a single thermal pixel is noisy, but averaging too wide a patch bleeds in the background. So the sampling radius scales with distance: a wider patch when the pet is close and fills the frame, a tighter one when it's far and small. There's extra damping at very short range so we don't over-sample:

88	    # Normalize so PROX_MM_MIN -> 0 and PROX_MM_MAX -> 1
89	    norm = (prox_clamped - PROX_MM_MIN) / max(PROX_MM_MAX - PROX_MM_MIN, 1e-6)
90	    radius = RADIUS_NEAR_TH - norm * (RADIUS_NEAR_TH - RADIUS_FAR_TH)
91	
92	    # Extra damping when we are closer than the threshold (very small working distance)
93	    if prox_clamped <= CLOSE_PROX_THRESHOLD:
94	        close_span = max(CLOSE_PROX_THRESHOLD - PROX_MM_MIN, 1e-6)
95	        close_norm = (CLOSE_PROX_THRESHOLD - prox_clamped) / close_span
96	        radius -= close_norm * CLOSE_PROX_SHRINK
97	
98	    return float(np.clip(radius, RADIUS_FAR_TH, RADIUS_NEAR_TH))

Rule 3 — Distance rescales the parallax shift. This is the heart of the registration. A residual alignment shift measured at one depth must be rescaled for the current depth, because parallax is inversely proportional to distance. The code does exactly that — scaling the stored shift by the ratio of the calibration depth to the live depth:

276	                    new_depth_m = float(self.current_proximity_mm) / 1000.0
277	                    if new_depth_m > 1e-6:
278	                        scale = float(stored_depth) / new_depth_m
279	                        scaled_shift = (
280	                            float(stored_shift[0]) * scale,
281	                            float(stored_shift[1]) * scale,
282	                        )

A shift calibrated at half a meter is doubled when the pet is at a quarter meter, and halved at a full meter. That single ratio is what keeps the thermal overlay locked to the body across the whole working range, instead of drifting off the animal as it moves toward or away from the station.

Reading the temperature. With the geometry resolved, the actual measurement is a lookup. A precomputed map turns each camera pixel in the region of interest into thermal coordinates; we round to integer thermal pixels, clip to bounds, gather the valid temperatures, and average them:

511	        tx = np.rint(region_map_x[valid]).astype(int)
512	        ty = np.rint(region_map_y[valid]).astype(int)
513	        tx = np.clip(tx, 0, w_th - 1)
514	        ty = np.clip(ty, 0, h_th - 1)
515	
516	        temps = self.thermal_data[ty, tx]
517	        temps = temps[np.isfinite(temps)]
518	        if temps.size == 0:
519	            return None
520	        return float(np.mean(temps))

The isfinite filtering matters: bad or out-of-range thermal readings are dropped rather than averaged in, so a single garbage pixel can't fake a fever or hide one.

The Results

The payoff is a temperature that stays pinned to the animal, not the scene, as the pet approaches and retreats. Because the offset, the sampling radius, and the parallax shift all flex with the proximity reading, a dog at the bowl and a dog a half-meter back both yield a temperature sampled from their actual body, with a patch size appropriate to how much of the frame they occupy.

It's also honest about uncertainty. Out-of-range thermal pixels are filtered, the sampling radius is clamped to a sensible band, and the whole thing degrades to "no reading" rather than a confidently wrong one when the geometry can't be resolved. That conservatism is exactly what a health metric needs — a missing temperature is recoverable; a fabricated one erodes trust in every reading after it.

Why It Matters at Hoomanely

Hoomanely is reinventing healthcare for pets — replacing reactive, imprecise care with continuous, clinical-grade monitoring that catches problems early. Our devices form a Physical Intelligence ecosystem: sensors fused at the edge, feeding the Biosense AI Engine that turns raw signals into personalized, preventive insights.

Body temperature is one of the most clinically meaningful vitals we can capture passively, and capturing it without a rectal probe or a restrained, stressed animal is exactly the kind of problem our physical-intelligence approach exists to solve. But a thermal number is only useful if it provably came from the pet — which is why the fusion of camera, thermal, and proximity matters more than any one sensor. Registration is the measurement.

Our guiding principle is that every signal matters and every detail counts. Tuning a thermal overlay by the millimeter, against a moving animal, is what turns three modest sensors into a vital sign a clinician can act on.

Key Takeaways

  • Camera-thermal alignment is distance-dependent, not fixed. Parallax between two closely-spaced sensors changes with subject distance, so a single static transform drifts off the target.
  • Make proximity a first-class input. Use the measured distance to drive the expected hotspot offset, the sampling radius, and the parallax-shift rescaling.
  • Work in one coordinate system. Convert body-size and offset terms into thermal-pixel units with an explicit scale factor so the math stays consistent.
  • Average a distance-appropriate patch, and filter it. Scale the sampling radius with distance and drop non-finite thermal pixels so noise can't fake or mask a reading.
  • Fail to "no reading," never to a wrong one. A clamped, validated pipeline that returns nothing when geometry is unresolved protects the integrity of the whole temperature trend.

Author's Note

This thermal-overlay registration runs on the compute module inside one of Hoomanely's physical-intelligence monitoring devices, fusing camera, thermal, and proximity sensing. The camera finds the pet, proximity tunes the geometry, and the thermal frame gives the number — together producing a body temperature measured passively, while the animal just goes about its day. It's a reminder that in multi-sensor health monitoring, the cleverest part often isn't any single sensor — it's making them agree.

Read more