knowledge-forge-mcp
Enables AI coding tools to turn a list of links into a verified, browsable knowledge base with grounded summaries via MCP tools for update, validation, and enrichment.
README
knowledge-forge
Turn a plain list of links (videos, posts, repos, articles) into a verified, browsable knowledge base — and let your AI coding tool enrich it with grounded, hallucination-checked summaries.
It ships three ways to drive the same pipeline so it works with any AI tool:
- a CLI (
knowledge-forge ...) - an MCP server (
knowledge-forge-mcp) for Cursor, Claude Code, Claude Desktop, etc. - portable skills + editor hooks
What it does
links.txt ──▶ extract (yt-dlp / whisper / youtube-transcript)
──▶ classify into categories + pull verified learnings
──▶ videos.json ──▶ generate a category/detail canvas (.canvas.tsx)
──▶ [optional] rule-based enrichment + an agent queue
──▶ agent writes 300-1000 word grounded summaries
──▶ validate (anti-hallucination gate) ──▶ apply ──▶ refresh
Key idea: summaries are grounded. A validator enforces word count, per-paragraph token overlap with the source material, source-URL provenance, and a no-engagement-bait rule — so the agent can't invent facts, repos, or URLs.
Capture: send a link, get a knowledge entry
The hardest part of any knowledge base is capturing the link the instant you see
its value. knowledge-forge ingest <url> is the one-shot front door: it appends
the URL, scrapes + classifies it, and queues it for a grounded summary.
knowledge-forge ingest "https://www.youtube.com/watch?v=..." # one link, fully processed
Wire it to Telegram so you can capture from your phone while scrolling — share
a reel to your bot and it lands in the base automatically. Use the standalone bot
in integrations/telegram/, or, if you already run Hermes or any
Telegram-connected agent, just give that agent the knowledge-forge ingest <url>
tool and let it handle the chat. See integrations/telegram/README.md.
share a link to Telegram ──▶ bot / Hermes extracts the URL
──▶ knowledge-forge ingest <url>
──▶ scrape + classify + queue ──▶ searchable entry
Discover & install (not just enrich)
Once the base is built, search it for a use case and get the recommended best practice plus the tools/skills to install — all grounded in the source videos:
knowledge-forge search "cut Claude Code token cost" --limit 5
Each hit returns a grounded snippet, the source links, any installCommands
and recommendedTools extracted verbatim from that video (e.g. a pip install
line or a GitHub repo), and a needsScrape flag when the entry is too thin and the
source should be transcribed + enriched for a fuller answer. This turns the base
into a recommender: "how do I improve X?" → best practice + the exact skill/tool to
install, cited to the video. Commands are only ever surfaced when present in the
source — never fabricated.
Localhost web UI (browse + search)
pip install "knowledge-forge[all,web]"
knowledge-forge serve # http://localhost:8077
knowledge-forge serve --no-bridge # web only
Category catalogue, detail view, and Mode B search in the browser. Docker: docker compose up -d — see integrations/docker/README.md. Enable Docker Desktop start-on-login for auto-resume + Telegram backlog drain.
Enrich with a local model (no IDE/agent required)
The agent-written summary step doesn't have to run inside an AI IDE. Point it at a local model served by Ollama and a background worker drains the queue on its own — the same deterministic validator (word count, per-paragraph grounding, no bait, no invented URLs) gates every summary, so only grounded text is ever saved.
ollama pull gemma3:4b
knowledge-forge enrich-worker # loop: find queued entries → local model → validate → apply
knowledge-forge enrich-worker --once # single drain pass
Combine with the web UI + Telegram bridge (and Docker) for a fully local loop:
send a link → scrape/classify → local model writes a grounded summary → live in
the UI, with no cloud calls and nothing tied to a specific editor. Tune with
KB_ENRICH_MODEL, KB_ENRICH_INTERVAL, KB_ENRICH_BATCH, KB_ENRICH_ATTEMPTS,
and OLLAMA_HOST.
Install
pip install "knowledge-forge[all]" # CLI + MCP + extraction + transcription
# or pick extras: .[extract] .[transcribe] .[mcp]
Core has no third-party dependencies; the extras add yt-dlp,
youtube-transcript-api, faster-whisper, and mcp.
Quick start
knowledge-forge init # scaffold config + folders in the current dir
# ... add URLs to links.txt (see samples/links.sample.txt) ...
knowledge-forge update # extract + classify + build the canvas
knowledge-forge run # rule-based enrich + build the agent queue
knowledge-forge queue # see what needs an agent-written summary
# agent writes payload.json, then:
knowledge-forge validate --file payload.json
knowledge-forge apply --file payload.json
Generate a one-command launcher for your OS:
knowledge-forge bootstrap --write # writes kf.ps1+kf.bat (Windows) or kf.sh (macOS/Linux)
Configuration
Everything is path-driven via knowledge-forge.json (see config.example.json).
All relative paths resolve against the config file's directory — no machine- or
OS-specific paths are baked in. The config is auto-discovered by walking up from
the CWD, or set KNOWLEDGE_FORGE_PROJECT / KNOWLEDGE_FORGE_CONFIG.
| Key | Default | Meaning |
|---|---|---|
linksFile / linksDir |
links.txt / knowledge-base/links |
inputs |
dataFile |
knowledge-base/videos.json |
the knowledge store |
transcriptsDir |
knowledge-base/transcripts |
cached transcripts |
queueFile |
knowledge-base/.agent-enrich-queue.json |
agent work queue |
canvasOutput |
knowledge-base/canvas/knowledge-base.canvas.tsx |
generated UI |
enrichMinSummaryWords / MaxSummaryWords |
300 / 1000 |
summary gate |
agentEnrichBatchSize |
5 |
entries per agent batch |
Use it from your AI tool
| Tool | MCP | Hooks | Skills |
|---|---|---|---|
| Cursor | integrations/cursor/mcp.json |
integrations/cursor/hooks* |
skills/* → .cursor/skills/ |
| Claude Code | integrations/claude-code/.mcp.json |
— | skills/* → .claude/skills/ |
| Telegram / Hermes | kf_ingest tool |
integrations/telegram/bridge.py |
— |
| Docker (localhost UI) | — | docker compose up |
integrations/docker/README.md |
| any MCP client | run knowledge-forge-mcp |
— | — |
MCP tools: kf_ingest, kf_search, kf_update, kf_refresh, kf_run, kf_queue, kf_list, kf_validate, kf_apply.
CLI also ships knowledge-forge serve / knowledge-forge-serve for the web UI at localhost:8077, and knowledge-forge enrich-worker / knowledge-forge-enrich for local-model enrichment.
See integrations/<tool>/README.md for copy-paste setup.
Project layout
knowledge_forge/ core package (config, pipeline, classify, verify, enrich, canvas, cli)
mcp_server/ FastMCP stdio server
skills/ tool-agnostic SKILL.md files
integrations/ per-tool hooks + MCP config + setup docs
samples/ anonymized example links + data
config.example.json starter config
License
MIT — see LICENSE.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.