Fastwheel.ai — Case Study

Healthcare

Health coach for a weight management startup

A RAG-grounded coaching agent aligned with the client's nutritionists — answering from approved protocols only, matching each dietician's communication style, handling 2,000+ conversations daily.

Hallucinations to users

2,041

Daily conversations

97.3%

Output adherence

Problem

Coaching doesn't scale.

The client is a health-coaching company with 60+ nutritionists managing weight-loss programs across India and the Middle East. Each patient gets a personalised diet plan, WhatsApp check-ins, and ongoing adjustments. The model works — retention is high, outcomes are strong.

The problem: each nutritionist can handle about 40 active patients. At that ratio, scaling means hiring proportionally. A team of 60 covers 2,400 patients. To reach 10,000, they'd need 250 nutritionists — and the hiring pipeline doesn't move that fast.

They'd tried a basic chatbot before. It lasted two weeks. The bot gave generic advice ("eat more vegetables"), ignored patient history, and on one occasion suggested a meal plan that conflicted with a patient's medication. The nutritionists pulled the plug.

"We don't need a chatbot. We need something that answers exactly the way Dr. Mehra would answer — and never, ever goes off-script on medical advice."
— Head of Product, client company

Approach

RAG with a structured context window.

We designed a retrieval layer that ensures the bot only answers from approved content. No general knowledge, no improvisation. Every response traces back to a source document that the client's medical team has signed off on.

The context window for each patient conversation is explicitly structured:

Past conversations — to match the assigned nutritionist's tone
Medical history and key risks
Current prescriptions
Weight and body metrics (weekly updates)
Goals and time horizon
Previous recommendations and adherence notes

Knowledge sources include: the client's proprietary diet protocols, a regional diet library covering South Asian, Middle Eastern, and Mediterranean cuisines, approved FAQ responses, and escalation SOPs for medical situations the bot should never handle.

GPT-4 Turbo Pinecone text-embedding-3-large WhatsApp Business API Preflight

Eval strategy

What Preflight monitors on every response.

The core risk isn't hallucination in the traditional sense — it's scope creep. The bot knows a lot about nutrition and will confidently answer questions about medication, exercise physiology, or medical conditions if you let it. That's the failure mode Preflight is designed to catch.

Preflight — Health coach Live

97.3%

Adherence

Blocked (30d)

0.91

Quality judge

1.2s

P95 latency

Source grounding (NLI)

142/146 claimsPass

Scope boundary enforcement

Diet-onlyPass

Tone match (persona: Dr. Mehra)

ConsistentPass

Medication mention detection

3 caught / 30dActive

Escalation SOP compliance

All paths coveredPass

In the first 30 days, Preflight blocked 47 responses — mostly medication-related scope violations where the LLM tried to be helpful about drug interactions. None of those responses reached patients. The model is not perfect. The system is.

Results

The numbers after 90 days.

Hallucinations to users

2,041

Daily conversations

40→120

Patients per nutritionist

97.3%

Output adherence

The bot handles the first 2-3 turns of most conversations — answering diet questions, logging meals, adjusting portions based on weekly weigh-ins. When a patient asks something outside scope (medication, exercise injury, emotional distress), the bot escalates to the human nutritionist with full context.

Each nutritionist now manages 120 patients instead of 40. The company is scaling to 10,000 patients without proportional hiring. Nutritionists spend their time on complex cases, not answering "can I eat rice at dinner."

Life sciences

Drug research assistant for a US pharma startup

A multi-agent research tool that searches scientific literature, extracts reagent tables, and drafts study designs — cross-checking every claim against multiple sources.

85%

Researcher time saved

40m

vs 3-day review

100%

Claims source-traced

Problem

Literature review is the bottleneck.

The client is a pre-clinical stage pharma startup with a team of 8 researchers. Before any experiment, someone has to review the existing literature — find relevant papers, extract methods and reagent details, identify conflicting findings, and draft the study design rationale. This process takes 2-3 days per research question. The team runs 3-4 questions per week. That means 40% of their research capacity goes to reading papers, not running experiments.

They'd tried ChatGPT directly. The outputs were fluent but unreliable — fabricated citations, hallucinated reagent catalogue numbers, confidently wrong dosing information. In drug research, a single wrong data point doesn't just waste time. It can derail a $200K experiment.

Approach

Three specialised agents working in sequence.

We built a multi-agent pipeline with three role-specialised agents, coordinated through our orchestration platform:

Agent 1 · Literature scanner

Takes a research question, searches PubMed and the client's internal knowledge base, retrieves relevant papers, and produces a ranked shortlist with relevance scores. Cross-references across databases to catch retracted or superseded studies.

Agent 2 · Data extractor

Reads the shortlisted papers and extracts structured data — reagent names, catalogue IDs, concentrations, conditions, vendor options, and approximate pricing. Outputs a normalised table aligned with the client's preferred vendors and procurement rules.

Reagent	Catalogue ID	Grade	Vendor	Est. price
DMEM/F-12	11320033	Cell culture	Thermo Fisher	$42/500ml
FBS	A3160801	Qualified	Gibco	$380/500ml
Matrigel	354234	GFR	Corning	$290/10ml
Y-27632	SCM075	Cell culture	Sigma-Aldrich	$185/5mg

Agent 3 · Synthesis writer

Takes the literature overview and extracted data, then produces a structured output: concise topic overview, key findings with limitations flagged, open questions, and a candidate study design with controls, arms, and sample size rationale. Everything is explicitly marked as requiring human PI validation.

Multi-agent PubMed API GPT-4 Turbo Pinecone Preflight

Eval strategy

Every claim grounded. Every citation verified.

In pharma, the cost of a wrong answer is an order of magnitude higher than in most domains. A hallucinated reagent catalogue number means ordering the wrong chemical. A fabricated citation means building a study design on research that doesn't exist.

Preflight — Drug research assistant Live

100%

Citation verified

Blocked (30d)

0.93

Quality judge

8.4s

Avg pipeline

Citation existence (PubMed DOI)

All verifiedPass

Catalogue ID validation

Cross-ref vendor APIPass

Claim-to-source grounding (NLI)

All tracedPass

Retraction check

0 retracted citedPass

Human validation flags

All markedPass

In the first month, Preflight caught 12 issues — 8 were catalogue IDs that had been discontinued or superseded by the vendor, 3 were citations where the DOI resolved to a different paper than described, and 1 was a retracted study. All were blocked before reaching the researcher's output.

Results

The numbers after 60 days.

85%

Researcher time saved

40m

Per research question

100%

Citations verified

Wrong data reached researchers

What used to take 2-3 days now takes 40 minutes. Researchers still review every output — the agent generates, it doesn't decide — but the review is now "check the structured summary" rather than "read 30 papers and build the table from scratch."

The team went from 3-4 research questions per week to 12-15. Their experimental throughput hasn't tripled because of AI — it's tripled because their researchers spend time designing experiments instead of reading papers.

"The real value isn't speed. It's that I trust the output. Every catalogue number checks out. Every citation is real. That's what the last tool couldn't do."
— Lead Researcher, client company

Shipped, monitored, guaranteed.

Health coach for a weight management startup

Coaching doesn't scale.

RAG with a structured context window.

What Preflight monitors on every response.

The numbers after 90 days.

Drug research assistant for a US pharma startup

Literature review is the bottleneck.

Three specialised agents working in sequence.

Every claim grounded. Every citation verified.

The numbers after 60 days.

Got an agent that needs to pass?