Mind Forge
Ingest, query, and generate study materials from documents (PDF, DOCX, Markdown, images, web pages) using vector search, knowledge graph, and study tools through OpenCode chat.
README
Mind Forge
Ingest, query, and generate study materials from documents — all through your OpenCode chat.
Mind Forge is an OpenCode plugin that turns documents (PDFs, DOCX files, Markdown, images, web pages) into a searchable knowledge base with vector search, a knowledge graph, and study tools. You describe what you want in chat, and the LLM calls the right MCP tool automatically.
Status: MVP implemented. The full pipe — ingest → embed → graph → study — is functional.
<p align="center"> <a href="https://github.com/goncalompontes/mind-forge/actions/workflows/ci.yml"> <img src="https://github.com/goncalompontes/mind-forge/actions/workflows/ci.yml/badge.svg" alt="CI"> </a> <a href="https://github.com/goncalompontes/mind-forge/blob/master/LICENSE"> <img src="https://img.shields.io/badge/license-MIT-blue.svg" alt="License"> </a> <img src="https://img.shields.io/badge/version-0.1.0-blue.svg" alt="Version"> <img src="https://img.shields.io/badge/coverage-90%25-brightgreen.svg" alt="Coverage"> </p>
Quick Start
# 1. Clone and install
git clone https://github.com/goncalompontes/mind-forge.git
cd mind-forge
npm install
npm run build
# 2. Register in your OpenCode config
Add to your opencode.json:
{
"mcpServers": {
"mind-forge": {
"command": "node",
"args": ["/path/to/mind-forge/dist/index.js"]
}
}
}
Then use it in chat:
You: Ingest the PDF at ~/papers/transformer-attention.pdf
Mind Forge: Ingested "Attention Is All You Need" (PDF, 15 chunks, 42 entities, 18 relationships)
You: Query: "how does multi-head attention work?"
Mind Forge: [3 results, scores 84–92%] Found in "Attention Is All You Need" chunk 4: "Multi-head attention allows the model to jointly attend to information from different representation subspaces..."
Architecture
Mind Forge registers three MCP tools that the LLM calls automatically:
| Tool | Purpose | Pipeline |
|---|---|---|
ingest |
Import a document | extract → embed → store → graph |
query |
Search your knowledge base | vector search + graph enrichment + FTS5 |
generate |
Create study materials | cards, quiz, exam, or review |
Data Flow
Document → extract() → chunks → embed() → store (SQLite + sqlite-vec)
↘ extractEntitiesAndRelationships() → graph store
↓
User query → embed() → vector search (ANN) → merge with FTS5 + graph enrichment → results
↓
User request → createCards() / generateQuiz() / createExam() → study materials
Storage
- SQLite via
better-sqlite3with WAL mode - Vector index via
sqlite-vec(768-dimension FLOAT embeddings) - Full-text search via FTS5 virtual table
- Knowledge graph in SQLite (entities + relationships tables)
- Single file at
~/.mind-forge/store.db(configurable viaMIND_FORGE_DB_PATH)
Source Format Support
| Format | Extractor | Library | Notes |
|---|---|---|---|
src/extract/pdf.ts |
pdftotext CLI + pdf-parse fallback |
Metadata via pdfinfo |
|
| DOCX | src/extract/docx.ts |
mammoth |
Metadata from docProps/core.xml |
| Markdown | src/extract/markdown.ts |
gray-matter |
Frontmatter parsing (title, author, custom fields) |
| Image | src/extract/image.ts |
tesseract.js |
PNG, JPG, JPEG, WebP; configurable OCR language |
| URL | src/extract/url.ts |
@mozilla/readability |
SSRF protection, size-limited streaming |
Configuration
Mind Forge auto-detects the best embedding provider. You can configure via environment variables:
| Env Variable | Purpose | Default |
|---|---|---|
MIND_FORGE_DB_PATH |
Database file path | ~/.mind-forge/store.db |
OLLAMA_HOST |
Ollama server URL | http://127.0.0.1:11434 |
Embedding provider selection (via EmbeddingConfig):
auto(default) — tries Ollama first, falls back to API provider if configuredollama— local Ollama (nomic-embed-textdefault, falls back toall-minilm,mxbai-embed-large)llm— OpenAI-compatible API (requiresapiKey)
Default chunk size: 1000 tokens (~4000 characters), paragraph-aware splitting.
Project Structure
src/
├── index.ts # Plugin entry point — registers MCP server
├── types.ts # All shared domain types (12 interfaces, 5 type aliases)
├── embed/ # Embedding providers
│ ├── provider.ts # Factory — auto/Ollama/LLM selection
│ ├── ollama.ts # Ollama adapter (ollama npm package)
│ └── llm-provider.ts # OpenAI-compatible API adapter
├── extract/ # Document extraction
│ ├── index.ts # Orchestrator + paragraph-aware chunking
│ ├── pdf.ts # PDF via pdftotext + pdf-parse
│ ├── docx.ts # DOCX via mammoth
│ ├── markdown.ts # Markdown via gray-matter
│ ├── image.ts # Image OCR via tesseract.js
│ └── url.ts # Web pages via @mozilla/readability
├── store/ # SQLite persistence
│ ├── database.ts # Singleton, schema, sqlite-vec init
│ ├── documents.ts # Document + chunk CRUD
│ └── vectors.ts # Vector insert + ANN search
├── graph/ # Knowledge graph
│ ├── extractor.ts # Pattern-based entity/relationship extraction
│ ├── index.ts # Graph storage (atomic transactions)
│ └── query.ts # BFS traversal, neighbors, pathfinding
├── study/ # Study tools
│ ├── cards.ts # SM-2 spaced repetition cards
│ ├── quiz.ts # Quiz generation + grading (MCQ, T/F, fill-blank)
│ └── exam.ts # Timed exam mode
└── mcp/ # MCP server
├── server.ts # Server registration + 3 tool handlers
├── ingest.tool.ts # IngestTool class (extract → embed → store → graph)
└── query.tool.ts # QueryTool class (hybrid search)
Dependencies
| Package | Purpose |
|---|---|
@modelcontextprotocol/sdk |
MCP server framework |
@opencode-ai/plugin |
OpenCode plugin registration |
better-sqlite3 |
SQLite database |
sqlite-vec |
Vector search extension |
ollama |
Local embedding via Ollama |
tesseract.js |
Image OCR |
@mozilla/readability |
Web page content extraction |
mammoth |
DOCX text extraction |
gray-matter |
Markdown frontmatter parsing |
pdf-parse |
PDF text extraction (fallback) |
jsdom |
DOM parsing for Readability |
Scripts
| Script | Command |
|---|---|
build |
tsc |
test |
vitest run |
typecheck |
tsc --noEmit |
Key Design Decisions
- Conversational interface: All interaction through OpenCode chat via MCP tools. No slash commands, no custom UI.
- Graceful degradation: Embedding or graph failures don't block ingestion. Document + chunks are always stored.
- Hybrid search: Vector similarity (0.7 weight) + FTS5 BM25 (0.3 weight) merged with dedup by chunk ID.
- Pattern-based extraction: Regex patterns for entities and relationships at MVP (LLM callback extension point available).
- SSRF protection: URL extraction resolves hostnames to IPs and rejects private/reserved ranges before connecting.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.