knowledge-forge-mcp

knowledge-forge-mcp

Enables AI coding tools to turn a list of links into a verified, browsable knowledge base with grounded summaries via MCP tools for update, validation, and enrichment.

Category
Visit Server

README

knowledge-forge

Turn a plain list of links (videos, posts, repos, articles) into a verified, browsable knowledge base — and let your AI coding tool enrich it with grounded, hallucination-checked summaries.

It ships three ways to drive the same pipeline so it works with any AI tool:

  • a CLI (knowledge-forge ...)
  • an MCP server (knowledge-forge-mcp) for Cursor, Claude Code, Claude Desktop, etc.
  • portable skills + editor hooks

What it does

links.txt ──▶ extract (yt-dlp / whisper / youtube-transcript)
          ──▶ classify into categories + pull verified learnings
          ──▶ videos.json ──▶ generate a category/detail canvas (.canvas.tsx)
          ──▶ [optional] rule-based enrichment + an agent queue
          ──▶ agent writes 300-1000 word grounded summaries
          ──▶ validate (anti-hallucination gate) ──▶ apply ──▶ refresh

Key idea: summaries are grounded. A validator enforces word count, per-paragraph token overlap with the source material, source-URL provenance, and a no-engagement-bait rule — so the agent can't invent facts, repos, or URLs.

Capture: send a link, get a knowledge entry

The hardest part of any knowledge base is capturing the link the instant you see its value. knowledge-forge ingest <url> is the one-shot front door: it appends the URL, scrapes + classifies it, and queues it for a grounded summary.

knowledge-forge ingest "https://www.youtube.com/watch?v=..."   # one link, fully processed

Wire it to Telegram so you can capture from your phone while scrolling — share a reel to your bot and it lands in the base automatically. Use the standalone bot in integrations/telegram/, or, if you already run Hermes or any Telegram-connected agent, just give that agent the knowledge-forge ingest <url> tool and let it handle the chat. See integrations/telegram/README.md.

share a link to Telegram ──▶ bot / Hermes extracts the URL
                         ──▶ knowledge-forge ingest <url>
                         ──▶ scrape + classify + queue ──▶ searchable entry

Discover & install (not just enrich)

Once the base is built, search it for a use case and get the recommended best practice plus the tools/skills to install — all grounded in the source videos:

knowledge-forge search "cut Claude Code token cost" --limit 5

Each hit returns a grounded snippet, the source links, any installCommands and recommendedTools extracted verbatim from that video (e.g. a pip install line or a GitHub repo), and a needsScrape flag when the entry is too thin and the source should be transcribed + enriched for a fuller answer. This turns the base into a recommender: "how do I improve X?" → best practice + the exact skill/tool to install, cited to the video. Commands are only ever surfaced when present in the source — never fabricated.

Localhost web UI (browse + search)

pip install "knowledge-forge[all,web]"
knowledge-forge serve                    # http://localhost:8077
knowledge-forge serve --no-bridge        # web only

Category catalogue, detail view, and Mode B search in the browser. Docker: docker compose up -d — see integrations/docker/README.md. Enable Docker Desktop start-on-login for auto-resume + Telegram backlog drain.

Enrich with a local model (no IDE/agent required)

The agent-written summary step doesn't have to run inside an AI IDE. Point it at a local model served by Ollama and a background worker drains the queue on its own — the same deterministic validator (word count, per-paragraph grounding, no bait, no invented URLs) gates every summary, so only grounded text is ever saved.

ollama pull gemma3:4b
knowledge-forge enrich-worker            # loop: find queued entries → local model → validate → apply
knowledge-forge enrich-worker --once     # single drain pass

Combine with the web UI + Telegram bridge (and Docker) for a fully local loop: send a link → scrape/classify → local model writes a grounded summary → live in the UI, with no cloud calls and nothing tied to a specific editor. Tune with KB_ENRICH_MODEL, KB_ENRICH_INTERVAL, KB_ENRICH_BATCH, KB_ENRICH_ATTEMPTS, and OLLAMA_HOST.

Install

pip install "knowledge-forge[all]"     # CLI + MCP + extraction + transcription
# or pick extras: .[extract] .[transcribe] .[mcp]

Core has no third-party dependencies; the extras add yt-dlp, youtube-transcript-api, faster-whisper, and mcp.

Quick start

knowledge-forge init                 # scaffold config + folders in the current dir
#  ... add URLs to links.txt (see samples/links.sample.txt) ...
knowledge-forge update               # extract + classify + build the canvas
knowledge-forge run                  # rule-based enrich + build the agent queue
knowledge-forge queue                # see what needs an agent-written summary
# agent writes payload.json, then:
knowledge-forge validate --file payload.json
knowledge-forge apply    --file payload.json

Generate a one-command launcher for your OS:

knowledge-forge bootstrap --write    # writes kf.ps1+kf.bat (Windows) or kf.sh (macOS/Linux)

Configuration

Everything is path-driven via knowledge-forge.json (see config.example.json). All relative paths resolve against the config file's directory — no machine- or OS-specific paths are baked in. The config is auto-discovered by walking up from the CWD, or set KNOWLEDGE_FORGE_PROJECT / KNOWLEDGE_FORGE_CONFIG.

Key Default Meaning
linksFile / linksDir links.txt / knowledge-base/links inputs
dataFile knowledge-base/videos.json the knowledge store
transcriptsDir knowledge-base/transcripts cached transcripts
queueFile knowledge-base/.agent-enrich-queue.json agent work queue
canvasOutput knowledge-base/canvas/knowledge-base.canvas.tsx generated UI
enrichMinSummaryWords / MaxSummaryWords 300 / 1000 summary gate
agentEnrichBatchSize 5 entries per agent batch

Use it from your AI tool

Tool MCP Hooks Skills
Cursor integrations/cursor/mcp.json integrations/cursor/hooks* skills/*.cursor/skills/
Claude Code integrations/claude-code/.mcp.json skills/*.claude/skills/
Telegram / Hermes kf_ingest tool integrations/telegram/bridge.py
Docker (localhost UI) docker compose up integrations/docker/README.md
any MCP client run knowledge-forge-mcp

MCP tools: kf_ingest, kf_search, kf_update, kf_refresh, kf_run, kf_queue, kf_list, kf_validate, kf_apply.

CLI also ships knowledge-forge serve / knowledge-forge-serve for the web UI at localhost:8077, and knowledge-forge enrich-worker / knowledge-forge-enrich for local-model enrichment.

See integrations/<tool>/README.md for copy-paste setup.

Project layout

knowledge_forge/      core package (config, pipeline, classify, verify, enrich, canvas, cli)
mcp_server/           FastMCP stdio server
skills/               tool-agnostic SKILL.md files
integrations/         per-tool hooks + MCP config + setup docs
samples/              anonymized example links + data
config.example.json   starter config

License

MIT — see LICENSE.

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured