harness-health-engineering
Enables querying a local health knowledge base with hybrid RAG (vector + FTS5) and exploring backlinks between notes.
README
<div align="center">
𧬠Harness Health Engineering
A local AI lab that helps you discover what actually improves your life.
Physiology streams in from Whoop Β· lived experience goes in by text Β· voice Β· selfie Β· an on-device agent runs n-of-1 experiments and tells you what actually makes life better β all as plain Markdown you own.
<br/>
</div>
π§ Explore
| β The one idea | π¬ The engine | π vs Whoop journal |
| π How it works | π₯ Three ways in | π€³ Photo diaries |
| π§± Architecture | π οΈ Technology | π Deploy your own |
The one idea
Every health gadget you've owned optimised a number. Recovery. HRV. Steps. And somewhere along the way the number became the point, and your actual life β whether you felt good, did meaningful work, saw people you love β fell out of frame. That's Goodhart's law wearing a fitness band: when a measure becomes the target, it stops measuring anything that matters.
This project flips it. The top-level metric here is not a body score. It is one honest question:
β North-star: "Is my life actually better?"
Everything else β recovery, sleep, HRV, supplements, training load β is demoted to what it really is: an instrument in service of that question. The system will happily tell you that your body looks great this week and your life doesn't, and then help you fix the right thing. No wearable can say that, because no wearable knows what your good life looks like.
Built for people who have tried enough supplements, trackers, routines, and protocols to know that the hard question is not βwhat is optimal?β but βwhat is optimal for me, in the life I actually live?β. Not for people who want motivation, streaks, badges, or a prettier sleep chart.
The engine: n-of-1 experiments π¬
Correlations are guesses. "You sleep worse when you drink" can't tell you if the drink did it or if a hard day caused both. So instead of guessing, the harness runs n-of-1 trials β single-subject experiments, the real methodology personalised medicine uses to decide if something works for one specific person. Every experiment is pre-registered:
| Step | Rule |
|---|---|
| Hypothesis | a specific causal claim |
| One variable | change exactly one thing β the rule everyone breaks |
| Baseline | a measured "before" |
| Duration + criterion | written before the data, so you can't fool yourself |
| Verdict | merge β it becomes a standing rule, or revert β drop it and log why |
This is the difference between "I tried a thing once" and knowledge that compounds. The payoff is a sentence no tracker will ever give you:
"Creatine moved nothing for you in three weeks β stop paying for it." "Caffeine before 14:00 bought you +35 min of deep sleep β it's a rule now."
One variable at a time. A clock. A criterion. A verdict that becomes a rule. That's the whole game, and it's why this gets smarter every week instead of just logging more.
stateDiagram-v2
direction LR
[*] --> proposed: /exp new Β· or agent proposes
proposed --> active: baseline + criterion set
active --> active: /exp extend
active --> merged: criterion met β becomes a rule
active --> reverted: missed / stopped β logged why
merged --> [*]
reverted --> [*]
You drive it from anywhere: /exp new to start, /exp extend to give it more time, /exp stop to call
it β or just ask the agent to design a tighter one. Either way it ends in a verdict, not a vibe.
Why it's different from Whoop's journal
Whoop's journaling is genuinely good β and it has a ceiling. Here's where this goes that it structurally can't:
| Wearable journal | Harness Health Engineering | |
|---|---|---|
| Top metric | a body score | your life quality |
| Evidence | correlation | n-of-1 causation β rules |
| Scope | body only | body Γ work Γ people Γ supplements Γ bloodwork |
| Output | dashboards | decisions and experiments |
| Reasoning | a black box | on-device, cited, interrogable in plain language |
| Memory | a feed | a versioned record you can ask "why was March hard?" |
| Guardrail | β | flags metric-tyranny: proxy up, life flat β that's a fail |
The discipline is the product. The data is just raw material. The full scientific rationale β
n-of-1 design, surrogate-endpoint failure, evidence labelling, confounding β is in
METHODOLOGY.md.
How it works
flowchart TD
W["Whoop API v2"] -->|"OAuth2 Β· auto 9:00 Β· serialized refresh"| SYNC["sync.mjs"]
H(["You Β· text Β· voice Β· selfie"]) -->|"events Β· mood Β· energy Β· social Β· work Β· body"| BOT["Telegram bot<br/>capture layer Β· on-device"]
SYNC -->|"recovery Β· HRV Β· sleep Β· strain"| RAW["01_raw Β· daily record"]
BOT --> RAW
RAW --> SRC["02_sources Β· weekly notes<br/>FACT / INFERENCE"]
SRC --> SYN["04_synthesis Β· patterns + life-quality"]
SYN --> EXP["05_decisions Β· n-of-1 experiments<br/>one variable Β· criterion Β· verdict"]
EXP --> RULE["CLAUDE.md Β· rules that stuck"]
EXP -.->|"merge / revert"| RAW
NS["β north-star:<br/>is life better?"] --- SYN
AGENT["on-device agent<br/>RAG Β· cited synthesis Β· MCP"] --- RAW
AGENT --- SYN
AGENT --- EXP
The loop: Signal β Ingest β Source note β Synthesis β Experiment β Verdict β Rule β repeat. Objective body data arrives on its own; you add a 40-second diary; on Sundays the agent scores your week against the north-star and finds what's actually moving it.
What it tracks
Daily (auto): recovery, HRV, resting HR, sleep, strain β from Whoop. Daily (you, ~40s β by text, voice, or selfie): events (your impressions journal), mood, energy, social, work, movement, supplements. Weekly: one integral "is life better?" 1β5. Quarterly: six life dimensions β emotion, connection, body, meaning, autonomy, growth.
Three ways in π₯
Lived experience is messy, and you shouldn't have to sit at a keyboard to capture it. The bot takes three modalities β mix them freely, several per day:
| Mode | How | Processing | Privacy |
|---|---|---|---|
| βοΈ Text | type a line | appended to today's record | local |
| ποΈ Voice | send a voice note | transcribed on-device (Whisper ONNX) β text | audio never leaves the machine |
| π€³ Selfie | send a photo | archived to a local visual diary, reviewed on demand by the agent | biometric β never committed to git |
A photo is auto-classified on-device (CLIP, zero-shot β the image never leaves the machine) into one of three local diaries, each read against its own canon β all descriptive, never diagnostic. No caption needed; a caption simply overrides the guess:
| Type (auto) | Diary | Canon the agent uses |
|---|---|---|
| selfie (default) | face | skin / fluid / affect signals over time |
| food | meals | plate composition Β· protein Β· fibre Β· processing Β· timing |
| stool | gut | Bristol Stool Scale β type, colour, frequency |
Every message lands as a timestamped line - 14:30 β¦ in 01_raw/health/YYYY-MM-DD.md, so the shape of
the day is preserved β not flattened into one average. Commands set the rest: /north for your
north-star, /exp to run an n-of-1, /week Β· /month Β· /year for horizons.
Three photo diaries, one on-device eye π€³
Your face, your plate, and your gut all leave visible traces of how you live β and a wearable sees none of them. The harness turns photos into longitudinal signals, while staying strictly on the safe side of the line: it describes and compares over time, it never diagnoses.
The on-device eye: how routing works ποΈ
Send any photo β no caption, no menu. The bot identifies what it is with CLIP zero-shot image classification running locally (ONNX, the same on-device stack as the voice transcriber). The image never leaves the machine; nothing is uploaded.
flowchart TD
P["π· photo<br/>(Telegram)"] --> DL["bot: download"]
DL --> CLIP{"on-device CLIP<br/>zero-shot Β· ONNX"}
CLIP -->|face| S1["photos/ Β· face"]
CLIP -->|meal| S2["photos/food/ Β· food"]
CLIP -->|toilet| S3["photos/stool/ Β· gut"]
S1 --> STORE["local Β· gitignored<br/>only a filename ref in the daily log"]
S2 --> STORE
S3 --> STORE
STORE -.->|"on request: 'review my β¦'"| AGENT["agent vision<br/>(strong model)"]
AGENT --> OBS["dated observations Β· local"]
OBS --> HYP["hypothesis"] --> EXP["n-of-1 experiment"]
- Selfie is the safe default β the classifier must clear a confidence margin to file a photo as food or stool; otherwise it stays in the neutral face diary.
- A caption always wins β write "food" / "stool" / "selfie" to override the guess.
- Storage is the only automatic step. Content is never auto-analysed β the deep read happens only when you ask the agent ("review my selfies / food / stool"), keeping a strong model's quality without sending anything anywhere by default. (Warm classification β 150 ms.)
1 Β· Face β the visual diary
What the agent reads from a face β over time, not in one shot:
| Group | Signals | May reflect | Cross-checked against |
|---|---|---|---|
| π§ Fluid / puffiness | under-eye bags, facial fullness, lid heaviness | water retention, fatigue | sleep, salt/alcohol at night, cycle phase, stress |
| π¨ Skin tone | redness/flush, sallowness, pallor, blotchiness | vascular reaction, tiredness | alcohol, heat/exertion, recovery, hydration |
| π§΄ Texture / breakouts | spot count & location, shine vs dryness | hormonal pattern, hydration | cycle phase, sugar/dairy (hypothesis), stress, sleep |
| ποΈ Eyes | sclera redness, dark circles, clarity of gaze | tiredness, irritation | sleep, alcohol, screens, allergy |
| π³ Affect / vitality | jaw/brow tension, downturned vs lit-up | mood, energy | mood/energy 1β5, "lived as wanted?" |
2 Β· Food β read against the plate canon
Method: the plate canon β ~Β½ vegetables Β· ~ΒΌ protein Β· ~ΒΌ complex carbs Β· + healthy fat.
| Signal | May reflect | Cross-checked against |
|---|---|---|
| protein present? | satiety, stable glucose | energy, afternoon sugar cravings |
| fibre / veg share | gut transit, fullness | stool, energy |
| processing level (whole vs ultra-processed) | inflammation (hypothesis) | mood, energy |
| refined sugar / fast carbs | glucose swings | energy, sleep, skin |
| meal timing (late eating) | overnight recovery | sleep, next-day recovery |
No calorie counting β unreliable from a photo; the agent reads composition and timing, not numbers. Described neutrally, never moralised β an eating-disorder guardrail keeps the focus on food β how you feel, not control or guilt.
3 Β· Gut β the Bristol Stool Scale
Method: the clinical Bristol Stool Scale (type 1β7, with 3β4 as the healthy middle) plus colour.
| Read | May reflect | Cross-checked against |
|---|---|---|
| type 1β2 (hard lumps) | slow transit, constipation-leaning | water, fibre, magnesium, travel |
| type 3β4 | normal | β |
| type 6β7 (loose / watery) | fast transit | trigger foods, stress, FODMAPs, caffeine |
| colour (pale-clay Β· black-tarry Β· red) | bile flow Β· possible bleed | red-flag β doctor |
Black/tarry stool or visible blood β see a doctor, not a diary entry β no interpretation attempted.
Hard limits, across all three diaries
- Not a diagnosis β any worrying sign resolves to "see a doctor," never an interpretation.
- Constitution β trend β innate features (dark circles, face shape) are constants, not changes.
- Shooting noise β light, angle, makeup, time of day distort more than physiology; mismatched shots β low confidence.
- Biology lags β skin breakouts (and gut shifts) surface days after a trigger; never pinned to "yesterday."
- One shot β a pattern β value is the trend; strongest evidence = paired shots, "morning after X vs morning without X" β already almost an n-of-1.
Architecture: the knowledge pyramid π§±
Everything is plain Markdown in a layered pyramid β raw signal at the base, decisions at the top. Each layer only consumes the one below it, so evidence flows upward and nothing is asserted without a trail back to its source.
| Layer | Holds | Example |
|---|---|---|
β 00_context |
north-star, metrics, the domain frame | "is life better?", leading vs lagging metrics |
π₯ 01_raw |
daily records β you + Whoop | 2026-06-13.md, photos, voice notes |
π 02_sources |
weekly notes, evidence-labelled | FACT / INFERENCE per week |
π 03_wiki |
what you've learned, baselines | personal HRV, supplement stack |
π§© 04_synthesis |
patterns + life-quality | the cross-layer story |
π¬ 05_decisions |
n-of-1 experiments | one variable Β· criterion Β· verdict |
π 06_outputs |
finished artefacts | specs, talks |
flowchart TB
CAP["βοΈ ποΈ π€³ capture Β· β Whoop"] --> R["π₯ 01 Β· raw"]
R --> S["π 02 Β· sources<br/>FACT / INFERENCE"]
S --> Y["π§© 04 Β· synthesis<br/>patterns Β· life-quality"]
Y --> D["π¬ 05 Β· decisions<br/>n-of-1 experiments"]
D --> RU["β
rules that stuck"]
D -. "merge / revert" .-> R
NS["β 00 Β· north-star"] -. "is life better?" .-> Y
The retrieval, hooks, and agent all read this pyramid β never raw guesses. The full scientific rationale
lives in METHODOLOGY.md; the layer contracts in AGENTS.md.
Technology
| Layer | Stack |
|---|---|
| Ingest | Whoop API v2, OAuth2 (serialized, race-safe token refresh), Node 22, zero-dep |
| Capture | free Telegram bot (local, long-polling) β text Β· voice Β· selfie β + Markdown |
| Voice | whisper (ONNX, on-device) + prebuilt ffmpeg-static β transcription, audio never leaves the device |
| Vision | on-device CLIP (ONNX) auto-routes photos β selfie / food / stool diaries; deep review by agent on demand β non-diagnostic |
| Knowledge base | layered Markdown pyramid 00β06 with frontmatter contracts + evidence labels |
| Retrieval | hybrid RAG β multilingual-e5-small (ONNX) + sqlite-vec + FTS5 BM25, fused via RRF |
| Agent | MCP server (kb_search / kb_think / kb_backlinks) for any MCP client |
| Control plane | Claude Code hooks (evidence + frontmatter linters), permissions, working-memory invariant |
| Automation | macOS launchd β morning sync + brief, hands-off |
| Quality | kb-doctor health-check, weekly dream-cycle audit |
Quickstart
π Standing up your own copy? Follow the full step-by-step in
SETUP.mdβ Telegram bot, Whoop app,.env, auto-start, and an honest privacy breakdown.
corepack enable
pnpm run setup
cp .env.example .env # WHOOP + Telegram bot tokens
pnpm whoop:auth # one-time OAuth β .whoop/ (gitignored)
pnpm whoop:sync # pull physiology + push a morning brief
pnpm kb:index
pnpm kb:think "what actually moves my life quality?"
pnpm kb:doctor
Daily rhythm: morning sync + brief run themselves Β· evening text/voice/photo your diary to the bot Β· Sunday ask the agent to review the week.
Privacy & safety
Local-first, but honest about the edges. Your record lives on your disk; pre-processing
(voiceβtext, photo-type, search) runs on-device; photos and sensitive diaries are git-ignored and
never pushed; secrets (.env, .whoop/) are git-ignored and deny-read by the agent. What does
leave the machine, by design: messages transit Telegram (the Bot API is not end-to-end encrypted),
Claude/Anthropic processes what you send the agent to analyse, and Whoop returns your
physiology. Nothing is sold or posted publicly. Full breakdown in SETUP.md.
Not a medical device. The agent never diagnoses; any worrying signal resolves to one recommendation β see a specialist.
Contributing
Built as one person's body-as-codebase, designed to generalise to any self-quantifier. PRs especially
welcome on: wearable adapters (Oura, Garmin, Apple Health), a live Whoop MCP server, richer experiment
designs, and synthesis evals. See CONTRIBUTING.md and the open
issues. (Yes, Bryan, you too.)
License
MIT Β© 2026 Christina Vinter. See LICENSE.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.