Mind Forge

Mind Forge

Ingest, query, and generate study materials from documents (PDF, DOCX, Markdown, images, web pages) using vector search, knowledge graph, and study tools through OpenCode chat.

Category
Visit Server

README

Mind Forge

Ingest, query, and generate study materials from documents — all through your OpenCode chat.

Mind Forge is an OpenCode plugin that turns documents (PDFs, DOCX files, Markdown, images, web pages) into a searchable knowledge base with vector search, a knowledge graph, and study tools. You describe what you want in chat, and the LLM calls the right MCP tool automatically.

Status: MVP implemented. The full pipe — ingest → embed → graph → study — is functional.

<p align="center"> <a href="https://github.com/goncalompontes/mind-forge/actions/workflows/ci.yml"> <img src="https://github.com/goncalompontes/mind-forge/actions/workflows/ci.yml/badge.svg" alt="CI"> </a> <a href="https://github.com/goncalompontes/mind-forge/blob/master/LICENSE"> <img src="https://img.shields.io/badge/license-MIT-blue.svg" alt="License"> </a> <img src="https://img.shields.io/badge/version-0.1.0-blue.svg" alt="Version"> <img src="https://img.shields.io/badge/coverage-90%25-brightgreen.svg" alt="Coverage"> </p>


Quick Start

# 1. Clone and install
git clone https://github.com/goncalompontes/mind-forge.git
cd mind-forge
npm install
npm run build

# 2. Register in your OpenCode config

Add to your opencode.json:

{
  "mcpServers": {
    "mind-forge": {
      "command": "node",
      "args": ["/path/to/mind-forge/dist/index.js"]
    }
  }
}

Then use it in chat:

You: Ingest the PDF at ~/papers/transformer-attention.pdf

Mind Forge: Ingested "Attention Is All You Need" (PDF, 15 chunks, 42 entities, 18 relationships)

You: Query: "how does multi-head attention work?"

Mind Forge: [3 results, scores 84–92%] Found in "Attention Is All You Need" chunk 4: "Multi-head attention allows the model to jointly attend to information from different representation subspaces..."


Architecture

Mind Forge registers three MCP tools that the LLM calls automatically:

Tool Purpose Pipeline
ingest Import a document extract → embed → store → graph
query Search your knowledge base vector search + graph enrichment + FTS5
generate Create study materials cards, quiz, exam, or review

Data Flow

Document → extract() → chunks → embed() → store (SQLite + sqlite-vec)
                                   ↘ extractEntitiesAndRelationships() → graph store
                                                   ↓
User query → embed() → vector search (ANN) → merge with FTS5 + graph enrichment → results
                                                   ↓
User request → createCards() / generateQuiz() / createExam() → study materials

Storage

  • SQLite via better-sqlite3 with WAL mode
  • Vector index via sqlite-vec (768-dimension FLOAT embeddings)
  • Full-text search via FTS5 virtual table
  • Knowledge graph in SQLite (entities + relationships tables)
  • Single file at ~/.mind-forge/store.db (configurable via MIND_FORGE_DB_PATH)

Source Format Support

Format Extractor Library Notes
PDF src/extract/pdf.ts pdftotext CLI + pdf-parse fallback Metadata via pdfinfo
DOCX src/extract/docx.ts mammoth Metadata from docProps/core.xml
Markdown src/extract/markdown.ts gray-matter Frontmatter parsing (title, author, custom fields)
Image src/extract/image.ts tesseract.js PNG, JPG, JPEG, WebP; configurable OCR language
URL src/extract/url.ts @mozilla/readability SSRF protection, size-limited streaming

Configuration

Mind Forge auto-detects the best embedding provider. You can configure via environment variables:

Env Variable Purpose Default
MIND_FORGE_DB_PATH Database file path ~/.mind-forge/store.db
OLLAMA_HOST Ollama server URL http://127.0.0.1:11434

Embedding provider selection (via EmbeddingConfig):

  • auto (default) — tries Ollama first, falls back to API provider if configured
  • ollama — local Ollama (nomic-embed-text default, falls back to all-minilm, mxbai-embed-large)
  • llm — OpenAI-compatible API (requires apiKey)

Default chunk size: 1000 tokens (~4000 characters), paragraph-aware splitting.


Project Structure

src/
├── index.ts              # Plugin entry point — registers MCP server
├── types.ts              # All shared domain types (12 interfaces, 5 type aliases)
├── embed/                # Embedding providers
│   ├── provider.ts       # Factory — auto/Ollama/LLM selection
│   ├── ollama.ts         # Ollama adapter (ollama npm package)
│   └── llm-provider.ts   # OpenAI-compatible API adapter
├── extract/              # Document extraction
│   ├── index.ts          # Orchestrator + paragraph-aware chunking
│   ├── pdf.ts            # PDF via pdftotext + pdf-parse
│   ├── docx.ts           # DOCX via mammoth
│   ├── markdown.ts       # Markdown via gray-matter
│   ├── image.ts          # Image OCR via tesseract.js
│   └── url.ts            # Web pages via @mozilla/readability
├── store/                # SQLite persistence
│   ├── database.ts       # Singleton, schema, sqlite-vec init
│   ├── documents.ts      # Document + chunk CRUD
│   └── vectors.ts        # Vector insert + ANN search
├── graph/                # Knowledge graph
│   ├── extractor.ts      # Pattern-based entity/relationship extraction
│   ├── index.ts          # Graph storage (atomic transactions)
│   └── query.ts          # BFS traversal, neighbors, pathfinding
├── study/                # Study tools
│   ├── cards.ts          # SM-2 spaced repetition cards
│   ├── quiz.ts           # Quiz generation + grading (MCQ, T/F, fill-blank)
│   └── exam.ts           # Timed exam mode
└── mcp/                  # MCP server
    ├── server.ts         # Server registration + 3 tool handlers
    ├── ingest.tool.ts    # IngestTool class (extract → embed → store → graph)
    └── query.tool.ts     # QueryTool class (hybrid search)

Dependencies

Package Purpose
@modelcontextprotocol/sdk MCP server framework
@opencode-ai/plugin OpenCode plugin registration
better-sqlite3 SQLite database
sqlite-vec Vector search extension
ollama Local embedding via Ollama
tesseract.js Image OCR
@mozilla/readability Web page content extraction
mammoth DOCX text extraction
gray-matter Markdown frontmatter parsing
pdf-parse PDF text extraction (fallback)
jsdom DOM parsing for Readability

Scripts

Script Command
build tsc
test vitest run
typecheck tsc --noEmit

Key Design Decisions

  • Conversational interface: All interaction through OpenCode chat via MCP tools. No slash commands, no custom UI.
  • Graceful degradation: Embedding or graph failures don't block ingestion. Document + chunks are always stored.
  • Hybrid search: Vector similarity (0.7 weight) + FTS5 BM25 (0.3 weight) merged with dedup by chunk ID.
  • Pattern-based extraction: Regex patterns for entities and relationships at MVP (LLM callback extension point available).
  • SSRF protection: URL extraction resolves hostnames to IPs and rejects private/reserved ranges before connecting.

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured