Surya - Tech@Hoomanely (Page 2)

AI

From Noisy Clips to Trustworthy Labels: How We Fixed Audio Classification at the Source

Most public audio datasets look clean on paper. Five seconds. Ten seconds. One label per clip. That assumption is exactly where our problems started. The Hidden Problem With Scraped Audio Datasets We started with the usual approach: scrape audio clips from public sources, bucket them by class (bark, howl, cough,

AI

Why Adding More Classes Broke Our MelCNN - and How We Fixed It

When we added new audio classes to our MelCNN pipeline (sneezing, coughing, anomalies), something unintuitive happened: accuracy on existing, well-performing classes like eating and drinking dropped - even though their data hadn’t changed. At first glance, this looked like a training issue. It wasn’t. What Actually Broke 1.

AI

When MelCNN Hallucinates: Why Clean Spectrograms Still Produce Wrong Sounds

How MelCNN learned “ghost sounds” from mislabeled silence Introduction Hallucination is usually discussed in the context of large language models—confident answers to questions that were never asked, or facts that never existed. But hallucination isn’t exclusive to text models. In audio systems, especially CNN‑based classifiers like MelCNN,

AI

CLAP Audio Transformer as a Validator, Not a Classifier

Using large models to sanity-check edge detections Introduction Audio datasets are fragile. Unlike images or text, you cannot visually skim through thousands of audio samples and immediately know whether they belong to the right class. A barking dog, a metal clank, or background human speech can sound deceptively similar in

AI

Engineering Non-Contact Dog Eye Temperature Measurement

Problem Modern pet‑health systems increasingly rely on thermal cameras to extract physiological signals such as eye temperature. The challenge is that thermal sensors are low‑resolution and almost never share the same optical center, field of view, or distortion model as the RGB camera doing the visual perception. In

AI

Why Transfer Learning Failed For Our Audio Noise Cancellation Pipeline

Introduction Transfer learning usually feels like a shortcut that just works. Pretrain on a large dataset, fine-tune on your task, and let scale do the heavy lifting. That approach worked well for us in vision - but audio broke that assumption in a very specific, painful way. At Hoomanely, we

AI

Augmenting Dog Eating Audio with Autoencoders

Introduction Audio is one of the hardest signals to work with in real-world ML systems. Unlike images, it is continuous, noisy, and deeply tied to the environment it is captured in. At Hoomanely, we build smart pet products that live inside homes - kitchens, living rooms, and feeding areas -

AI

How Do You Build Edge ML When You Don’t Have Labels Yet?

Introduction Building machine learning systems for the edge sounds exciting—until you realize that most real-world edge problems begin with a simple, uncomfortable truth: you don’t have labeled data. Not enough of it. Not the right kind. And certainly not data that reflects the messy environments your devices will

AI

Learning Meals with Image Similarity

Intro One of the hardest problems in smart pet devices isn’t sensing - it’s understanding context. At Hoomanely, our EverBowl feeding system can reliably detect when a dog is eating using weight sensors, proximity sensing, and on‑device vision. But knowing what the dog is eating is a

AI

How We Reduced False Detections by Training on “Bad” Indoor Data

Introduction Landmark detection models often look impressive during offline evaluation - clean datasets, sharp images, balanced lighting, and tidy annotations. But the real world is rarely that polite. At Hoomanely, we deploy dog‑face landmark detection models (eyes, nose, facial keypoints) on always‑on indoor devices mounted on feeding bowls.