From Unstructured Documents to Actionable Records: Engineering Paw Vault

From Unstructured Documents to Actionable Records: Engineering Paw Vault

Pet healthcare records are inherently unstructured.Vaccination certificates arrive as photos taken at a clinic. Blood reports are shared as PDFs. Prescription records are forwarded through messaging apps. Over time, these documents become scattered across phones, inboxes, and folders, making them difficult to search, track, or act upon when needed.

At Hoomanely, we wanted to solve a simple problem: how do we transform these disconnected documents into structured, searchable records while ensuring important milestones—such as vaccination renewals—are never missed?

This led to the development of Paw Vault, a document processing workflow that converts uploaded pet healthcare records into organized data and actionable reminders.

The challenge was not document storage. The challenge was building a workflow that could reliably handle inconsistent inputs, extract meaningful information, and turn that information into future actions.

At first glance, the solution appears straightforward: upload a document and ask the user to fill out a form.

In practice, that approach creates friction.

Users typically upload documents while multitasking—standing in a clinic, traveling, or managing a pet appointment. Asking them to manually transcribe clinic names, vaccination dates, and expiry information introduces unnecessary effort and increases abandonment.
The problem can be broken into three distinct engineering challenges:

1. Efficiently handling document uploads from mobile devices.

2. Extracting structured information from highly variable document formats.

3. Converting extracted information into reliable reminders.

Each stage introduces its own reliability, performance, and user-experience considerations.

High-level journey from document upload to reminder creation.

Capture → Preprocess → Upload → Extract → Review → Remind

Several principles guided the design of Paw Vault:

Reduce manual data entry whenever possible.

Keep users in control of extracted information.

Optimize for mobile-first constraints.

Prefer recoverable failures over silent failures.

Treat reminders as part of the document lifecycle rather than a separate feature.

These principles shaped every stage of the workflow.

Processing Documents Efficiently
One of the first challenges was dealing with document size and format variability.
Images captured from modern smartphones are often significantly larger than necessary for downstream processing. Uploading these files directly increases bandwidth consumption, processing time, and perceived latency.
To address this, image-based uploads undergo preprocessing before being transmitted. Documents such as PDFs are handled differently because aggressive transformation may reduce quality or affect document fidelity.

An equally important consideration is responsiveness.
Image transformations are computationally expensive enough to impact user experience if performed directly on the application's main execution thread. As a result, preprocessing is performed asynchronously, allowing upload progress and UI interactions to remain smooth and responsive.
This separation between user interaction and processing work significantly improves perceived performance without changing functionality.

Extracting Structure from Unstructured Data

The core challenge of Paw Vault is transforming unstructured healthcare documents into structured records.
The extraction layer receives a document and produces a compact set of fields that can drive search, filtering, categorization, and reminders.
Rather than treating extraction as a transcription problem, the system treats it as a data normalization problem.The goal is not to reproduce every word in the document.

The goal is to identify the information that users actually care about:

📂 Document Category
📄 Document Title
🏥 Issuing Clinic
📅 Issue Date
⏳ Expiry Date
📝 Summary Information

This distinction is important because users rarely search for entire documents. They search for specific information contained within those documents.

Converting document content into structured fields.


Vaccination Certificate

├── Document Type
├── Clinic
├── Issue Date
├── Expiry Date
└── Summary

Designing for Imperfect Extraction

Document-processing systems rarely operate in ideal conditions.

Documents may contain:
• Poor Lighting
• Skewed Images
• Handwritten Notes
• Multiple Languages
• Missing Sections
• Low-Quality Scans

Because of this, extraction accuracy should never be treated as absolute.One of the most important design decisions in Paw Vault was treating extracted data as a draft rather than a final result.After extraction, users are presented with a review screen where all fields remain editable.

This approach provides two advantages:
First, it significantly reduces manual effort when extraction succeeds.
Second, it provides a graceful recovery path when extraction is incomplete or partially incorrect.

Instead of forcing users to restart the process, the system allows quick corrections before the document is finalized.This balance between automation and user control has proven more effective than fully automated or fully manual workflows.

Building Reliable Reminder Automation

A stored document provides value only when it leads to action.
For vaccination records, expiry dates are often more important than the document itself. Missing a booster appointment can create administrative and healthcare complications for pet owners.
To address this, Paw Vault automatically creates reminder events once users confirm extracted expiry information.
While creating reminders sounds simple, production environments introduce several edge cases:

• Multiple Calendars on a Device
• Expired Documents
• Updated Expiry Dates
• Changed Account Configurations
• Permission Restrictions

The reminder workflow therefore focuses on consistency rather than simple event creation.When document information changes, reminders are updated rather than duplicated. Expired reminders are avoided entirely, and reminder schedules adapt to the remaining validity window of the document.
The result is a reminder system that behaves predictably across real-world usage patterns.

Reminder updates follow document updates.

Caption:

No Reminder
↓
Create Reminder
↓
Document Updated
↓
Update Existing Reminder

Reliability Considerations

A recurring theme throughout Paw Vault is graceful degradation.

Most failures are partial rather than complete.

Examples include:

  • Upload succeeds but extraction is incomplete.
  • Extraction succeeds but dates require verification.
  • Reminder creation succeeds but user permissions change later.

Instead of treating these situations as failures, the workflow is designed to recover from them.

Every stage produces an outcome that can be reviewed, corrected, retried, or updated without forcing users to repeat the entire process.

This approach improves reliability while reducing user frustration.


Results

The final workflow delivers three outcomes from a single uploaded document:

  • A searchable digital record.
  • Structured metadata for filtering and categorization.
  • Automated reminders tied to document validity.

More importantly, the workflow minimizes manual input while preserving user control over the final record.

The result is a system that feels simple from the user's perspective despite handling multiple stages of processing behind the scenes.


Connection to Hoomanely

Hoomanely's broader mission is to reduce the cognitive load associated with pet care.

Many pet-care challenges are fundamentally information-management problems: remembering vaccinations, tracking reports, managing prescriptions, and planning renewals.

Paw Vault contributes to that vision by transforming static documents into structured information that can drive future actions.

Rather than asking pet parents to remember more, the system is designed to remember on their behalf.


Key Takeaways

  • Unstructured documents become significantly more useful when converted into structured records.
  • Mobile-first document workflows require careful attention to bandwidth, responsiveness, and reliability.
  • Extraction systems should assist users rather than replace user verification.
  • Reminder automation is most effective when integrated directly into the document lifecycle.
  • Graceful recovery paths are often more valuable than perfect automation.
  • Engineering successful document workflows requires balancing accuracy, performance, and user experience.