← back to projects
AI

Corp Disease Identifier

AI-powered disease identification for corporate health screening.

Corp Disease Identifier

Overview

Corp Disease Identifier began as a research project around occupational health and grew into a production backend that processes anonymised health screening data. It exposes a small, sharp FastAPI surface backed by trained scikit-learn pipelines, with strict input validation and audit-friendly logging.

118ms
p95 latency
94.2%
model accuracy
6
endpoints

Highlights

  • Trained scikit-learn pipelines with versioned model artefacts
  • Pydantic v2 schemas for every request and response
  • PII-safe logging with structured JSON and request IDs
  • Dockerised inference service deployed behind a reverse proxy

Why I built it

Corporate wellness programmes generate huge amounts of screening data that nobody has time to read. I wanted a backend that could quietly flag risk patterns so clinicians could focus on the people who actually need attention, not on sifting through spreadsheets.

How it works

Screening payloads hit a single FastAPI endpoint, get validated by Pydantic, then flow into a scikit-learn pipeline loaded once at startup. The model returns a risk score and a small set of contributing features, which the API wraps with an explanation block before responding.

What I learned

Shipping ML behind a real API forces you to take input validation, versioning and observability seriously. The model is the easy part — the boring infrastructure around it is what makes the system trustworthy.

Challenges

  • ×Designing a feature pipeline robust to missing or noisy screening inputs
  • ×Keeping inference latency under 150ms while loading large models
  • ×Building an audit trail without storing identifying patient data

A peek at the API