Corp Disease Identifier

AI-powered disease identification for corporate health screening.

Overview

Corp Disease Identifier began as a research project around occupational health and grew into a production backend that processes anonymised health screening data. It exposes a small, sharp FastAPI surface backed by trained scikit-learn pipelines, with strict input validation and audit-friendly logging.

118ms

p95 latency

94.2%

model accuracy

endpoints

Highlights

→Trained scikit-learn pipelines with versioned model artefacts

→Pydantic v2 schemas for every request and response

→PII-safe logging with structured JSON and request IDs

→Dockerised inference service deployed behind a reverse proxy

Why I built it

Corporate wellness programmes generate huge amounts of screening data that nobody has time to read. I wanted a backend that could quietly flag risk patterns so clinicians could focus on the people who actually need attention, not on sifting through spreadsheets.

How it works

Screening payloads hit a single FastAPI endpoint, get validated by Pydantic, then flow into a scikit-learn pipeline loaded once at startup. The model returns a risk score and a small set of contributing features, which the API wraps with an explanation block before responding.

What I learned

Shipping ML behind a real API forces you to take input validation, versioning and observability seriously. The model is the easy part — the boring infrastructure around it is what makes the system trustworthy.

Challenges

×Designing a feature pipeline robust to missing or noisy screening inputs

×Keeping inference latency under 150ms while loading large models

×Building an audit trail without storing identifying patient data

Corp Disease Identifier

Overview

Highlights

Why I built it

How it works

What I learned

Challenges

A peek at the API