LearnMCP Server
Extracts and summarizes learning content from YouTube videos, PDFs, and web articles to provide context for project-based learning. It features automated background processing and integrates with Forest's HTA builder for informed task generation.
README
LearnMCP Server
A standalone MCP server that enhances Forest with learning content extraction and summarization capabilities.
Overview
LearnMCP extracts and summarizes learning content from various sources (YouTube videos, PDFs, web articles) and makes those summaries available to Forest's HTA builder for more informed task generation.
Features
- Content Extraction: YouTube videos (with transcripts), PDF documents, web articles
- Background Processing: Async content processing with queue management
- Smart Summarization: Content chunking and summarization with relevance scoring
- Forest Integration: Optional integration with Forest's HTA tree builder
- Standalone Operation: Can be enabled/disabled independently of Forest
Architecture
User → LearnMCP Tools → LearnService → BackgroundProcessor ⇄ Extractors ⇄ Summarizer → DataPersistence
↓
<DATA_DIR>/learn-content/
↓
Forest HTA Builder (optional)
Installation
-
Install Dependencies:
cd learn-mcp-server npm install -
Configure MCP: Add to your
mcp-config.json:{ "mcpServers": { "learn-mcp": { "command": "node", "args": ["server.js"], "cwd": "learn-mcp-server", "env": { "FOREST_DATA_DIR": "<same as Forest>" } } } } -
Start Server: The server starts automatically when Claude Desktop loads the MCP config.
Available Tools
add_learning_sources
Add learning sources (URLs) to a project for content extraction.
Parameters:
project_id(string): Project ID to add sources tourls(array): Array of URLs (YouTube, PDF, articles)
Example:
{
"project_id": "my_project",
"urls": [
"https://youtube.com/watch?v=example",
"https://example.com/document.pdf",
"https://blog.example.com/article"
]
}
process_learning_sources
Start background processing of pending learning sources.
Parameters:
project_id(string): Project ID to process sources for
list_learning_sources
List learning sources for a project, optionally filtered by status.
Parameters:
project_id(string): Project IDstatus(string, optional): Filter by status (pending, processing, completed, failed)
get_learning_summary
Get learning content summary for a project or specific source.
Parameters:
project_id(string): Project IDsource_id(string, optional): Specific source ID (if not provided, returns aggregated summary)token_limit(number, optional): Maximum tokens for aggregated summary (default: 2000)
delete_learning_sources
Delete learning sources and their summaries.
Parameters:
project_id(string): Project IDsource_ids(array): Array of source IDs to delete
get_processing_status
Get current processing status for learning sources.
Parameters:
project_id(string): Project ID
Supported Content Types
YouTube Videos
- Extracts video metadata (title, author, duration, etc.)
- Downloads transcripts when available
- Falls back to description if no transcript
PDF Documents
- Extracts text content from remote PDF URLs
- Preserves document metadata
- Handles various PDF formats
Web Articles
- Uses Mozilla Readability for clean content extraction
- Extracts metadata (title, author, publish date, etc.)
- Estimates reading time
Data Storage
LearnMCP stores data in <FOREST_DATA_DIR>/learn-content/:
learn-content/
├── <project_id>/
│ ├── sources.json # Source registry
│ └── summaries/
│ ├── <source_id>.json # Individual summaries
│ └── ...
Forest Integration
When both LearnMCP and Forest are active, Forest's HTA builder can optionally include learning content summaries in its task generation prompts. This happens automatically when:
- LearnMCP has processed learning sources for a project
- Forest builds an HTA tree for the same project
- Learning content summaries are injected into the HTA generation prompt
Workflow Examples
Basic Learning Content Workflow
-
Add Sources:
add_learning_sources(project_id="learn_python", urls=["https://youtube.com/watch?v=python_tutorial"]) -
Process Content:
process_learning_sources(project_id="learn_python") -
Check Status:
get_processing_status(project_id="learn_python") -
Get Summary:
get_learning_summary(project_id="learn_python")
Integrated with Forest
- Add and process learning sources in LearnMCP
- Build HTA tree in Forest - it will automatically include learning content context
- Generated tasks will be informed by the processed learning materials
Configuration
Environment Variables
FOREST_DATA_DIR: Shared data directory with Forest (required)LOG_LEVEL: Logging level (debug, info, warn, error)NODE_ENV: Environment (development, production)
Background Processor Settings
- Max Queue Size: 50 tasks
- Max Concurrent: 2 simultaneous extractions
- Processing Interval: 3 seconds
- Retry Attempts: 3 per source
- Timeout: 5 minutes per extraction
Error Handling
- Graceful Degradation: Failed extractions don't block other sources
- Retry Logic: Automatic retries with exponential backoff
- Comprehensive Logging: Detailed logs for debugging
- Status Tracking: Clear status indicators for each source
Development
Running Tests
npm test
Linting
npm run lint
npm run lint:fix
Debugging
Set LOG_LEVEL=debug for detailed logging.
Troubleshooting
Common Issues
- YouTube extraction fails: Check if video has transcripts enabled
- PDF extraction fails: Ensure PDF is publicly accessible
- Article extraction fails: Some sites block automated access
Logs
Check logs in <FOREST_DATA_DIR>/logs/:
learn-mcp.log: General operationslearn-mcp-errors.log: Error details
License
MIT License - Same as Forest MCP Server
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.