HPR Knowledge Base MCP Server

HPR Knowledge Base MCP Server

Provides search and retrieval of Hacker Public Radio episodes, transcripts, hosts, and series via MCP tools and resources, with fuzzy matching and flexible transcript search.

Category
Visit Server

README

Hacker Public Radio Knowledge Base MCP Server

An MCP (Model Context Protocol) server providing access to the Hacker Public Radio (HPR) knowledge base, including episodes, transcripts, hosts, series, and community comments.

About HPR

Hacker Public Radio is a community-driven podcast where hosts contribute content on topics of interest to hackers. All content is released under Creative Commons licenses, making it freely available for learning and sharing.

Features

This MCP server provides:

  • Episode Search: Search through thousands of HPR episodes by title, summary, tags, or host notes
    • Fuzzy Matching: Automatically handles typos and misspellings (e.g., "linx" finds "linux", "pythoon" finds "python")
  • Transcript Search: Full-text search across all episode transcripts with flexible matching modes
  • Episode Details: Get complete information about any episode including transcript and comments
  • Host Information: Look up hosts and see all their contributions
    • Fuzzy Matching: Handles name variations and typos (e.g., "klattu" finds "Klaatu")
  • Series Browsing: Explore mini-series of related episodes
  • Statistics: View overall HPR statistics and recent episodes

Installation

Prerequisites

  • Node.js 18 or higher
  • The HPR data files:
    • hpr_metadata/ directory containing JSON files
    • hpr_transcripts/ directory containing transcript files

Setup

  1. Install dependencies:
npm install
  1. Make the server executable:
chmod +x index.js

Usage

Running Locally (Stdio Mode)

You can test the stdio server directly (for local MCP clients like Claude Desktop):

npm start

Running as HTTP Server (Network Access)

For network access and public deployment, use the HTTP/SSE server:

npm run start:http

This starts an HTTP server on port 3000 (configurable via PORT environment variable) with:

  • SSE endpoint: http://localhost:3000/sse
  • Health check: http://localhost:3000/health
  • Built-in rate limiting, compression, and graceful degradation

Using with AI Tools

Claude Desktop (and other MCP-compatible clients):

See CONFIGURATION.md for detailed setup instructions for:

  • Claude Desktop (stdio - fully supported)
  • ⚠️ Other MCP Clients (varies by client)
  • ChatGPT (not supported - workarounds included)
  • GitHub Copilot (not supported - alternatives included)
  • Google Gemini (not supported - integration options)
  • 🔧 Custom Integration (Python/Node.js examples)

Quick Start (Claude Desktop):

Add this to your Claude Desktop configuration file:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json Windows: %APPDATA%/Claude/claude_desktop_config.json Linux: ~/.config/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "hpr-knowledge-base": {
      "command": "node",
      "args": ["/absolute/path/to/knowledge_base/index.js"]
    }
  }
}

Replace /absolute/path/to/knowledge_base/ with the actual path to this directory.

Note: Claude Desktop currently only supports local (stdio) connections. Remote HTTP/SSE support coming in future versions.

Using with Other MCP Clients

Any MCP-compatible client can connect to this server via stdio. The server will load all HPR data on startup and make it available through tools and resources.

Available Tools

1. search_episodes

Search for episodes by keywords in title, summary, tags, or notes.

Parameters:

  • query (string): Search query
  • limit (number, optional): Maximum results (default: 20)
  • hostId (number, optional): Filter by specific host
  • seriesId (number, optional): Filter by specific series
  • tag (string, optional): Filter by tag
  • fromDate (string, optional): Filter from date (YYYY-MM-DD)
  • toDate (string, optional): Filter to date (YYYY-MM-DD)

Example:

Search for episodes about "linux kernel" from 2020 onwards

2. get_episode

Get detailed information about a specific episode.

Parameters:

  • episodeId (number, required): Episode ID
  • includeTranscript (boolean, optional): Include transcript (default: true)
  • includeComments (boolean, optional): Include comments (default: true)

Example:

Get details for episode 16 including transcript and comments

3. search_transcripts

Search through episode transcripts for phrases or multiple terms with flexible matching.

Parameters:

  • query (string, optional): Phrase to search for. Useful for exact-phrase lookups.
  • terms (string[], optional): Explicit list of terms to search for; combine with matchMode for logical AND/OR searches.
  • matchMode ('phrase' | 'any' | 'all', optional): How to combine query/terms. Defaults to 'phrase'. Use 'any' to match if any term is present, 'all' to require every term somewhere in the transcript.
  • limit (number, optional): Maximum episodes to return (default: 20).
  • contextLines (number, optional): Lines of context to include around each match (default: 3).
  • hostId (number, optional): Only return matches for this host ID.
  • hostName (string, optional): Only return matches for hosts whose name includes this value.
  • caseSensitive (boolean, optional): Treat terms as case-sensitive (default: false).
  • wholeWord (boolean, optional): Match whole words only (default: false).
  • maxMatchesPerEpisode (number, optional): Maximum number of excerpts per episode (default: 5).

Example queries:

Find transcripts mentioning "virtual machine"
Find transcripts where klaatu talks about bash or python
List episodes where host ID 123 mentions "encryption" and "privacy" (require all terms)

4. get_host_info

Get information about a host and their episodes.

Parameters:

  • hostId (number, optional): Host ID
  • hostName (string, optional): Host name to search for
  • includeEpisodes (boolean, optional): Include episode list (default: true)

Example:

Get information about host "klaatu" including all their episodes

5. get_series_info

Get information about a series and all its episodes.

Parameters:

  • seriesId (number, required): Series ID

Example:

Get information about series 4 (Databases series)

Fuzzy Matching

The server includes intelligent fuzzy matching for episode and host searches to handle typos and misspellings.

How It Works

  1. Exact Match First: The server always tries exact substring matching first for speed
  2. Fuzzy Fallback: If no exact matches are found, it falls back to fuzzy matching using Levenshtein distance
  3. Match Indicators: Results include indicators showing whether they're exact or fuzzy matches

Examples

Host Search:

  • Query: "klattu" → Finds: Klaatu (fuzzy match, distance: 1)
  • Query: "ken" → Finds: Ken Fallon (exact match)

Episode Search:

  • Query: "pythoon" → Finds episodes with python in the title (fuzzy match, distance: 1)
  • Query: "linx" → Finds episodes with linux (may match exactly in summary/tags, or fuzzy in title)

Distance Thresholds

  • Hosts: Maximum distance of 2 characters (handles 1-2 typos)
  • Episodes: Maximum distance of 3 characters (more lenient for longer titles)

What the AI Agent Sees

When fuzzy matching is used, results include:

  • matchType: 'exact' or matchType: 'fuzzy'
  • matchDistance: N (for fuzzy matches, indicating how many character edits were needed)

This allows AI agents to provide context to users, such as: "I found results for 'klaatu' (you typed 'klattu')"

Technical Details

The fuzzy matching uses the Levenshtein distance algorithm, which counts the minimum number of single-character edits (insertions, deletions, substitutions) needed to change one string into another.

Note: Transcript search uses regex-based matching and does not use fuzzy matching, as the flexible regex patterns already handle many variations.

Available Resources

hpr://stats

Overall statistics about the HPR knowledge base

hpr://episodes/recent

List of 50 most recent episodes

hpr://hosts/all

List of all HPR hosts with episode counts

hpr://series/all

List of all HPR series with descriptions

Data Structure

The server expects the following directory structure:

knowledge_base/
├── index.js
├── data-loader.js
├── package.json
├── hpr_metadata/
│   ├── episodes.json
│   ├── hosts.json
│   ├── comments.json
│   └── series.json
└── hpr_transcripts/
    ├── hpr0001.txt
    ├── hpr0002.txt
    └── ...

Deployment

The HTTP/SSE server (server-http.js) is designed for public deployment with graceful degradation features:

Features

  • Rate Limiting: 50 requests per minute per IP address
  • Request Timeouts: 30-second timeout per request
  • Concurrent Request Limiting: Maximum 10 concurrent requests
  • Circuit Breaker: Automatically stops accepting requests if failure rate is too high
  • Memory Monitoring: Rejects requests if memory usage exceeds 450MB
  • Compression: Gzip compression for all responses
  • CORS: Enabled for cross-origin requests

Recommended Hosting Options

Render.com (Recommended)

# Free tier available, $7/mo for always-on
# Auto-scaling and health checks built-in

Railway.app

# $5 free credit/month, pay-per-usage
# Scales to zero when idle

Fly.io

# Free tier: 256MB RAM
# Global edge deployment

Environment Variables

  • PORT: Server port (default: 3000)

Health Check

The server provides a health check endpoint at /health for monitoring:

curl http://localhost:3000/health

Returns:

{
  "status": "ok",
  "memory": {
    "used": "45.23MB",
    "threshold": "450MB"
  },
  "activeRequests": 2,
  "circuitBreaker": "CLOSED"
}

Development

Project Structure

  • index.js - Stdio MCP server (for local use)
  • server-http.js - HTTP/SSE MCP server (for network deployment)
  • data-loader.js - Data loading and searching functionality
  • package.json - Node.js package configuration

Extending the Server

You can add new tools or resources by:

  1. Adding new methods to HPRDataLoader in data-loader.js
  2. Registering new tools in the ListToolsRequestSchema handler
  3. Implementing tool logic in the CallToolRequestSchema handler

License

This MCP server code is released under CC-BY-SA to match the HPR content license.

The Hacker Public Radio content itself is released under various Creative Commons licenses as specified in each episode's metadata.

Credits

  • Hacker Public Radio: https://hackerpublicradio.org
  • MCP SDK: https://modelcontextprotocol.io

Contributing

Contributions are welcome! This server can be extended with:

  • Advanced search features (relevance ranking, semantic search)
  • Tag cloud generation
  • Episode recommendations
  • Audio file access
  • Web interface for browsing

Support

For issues related to:

  • This MCP server: Open an issue in this repository
  • HPR content: Visit https://hackerpublicradio.org
  • MCP protocol: Visit https://modelcontextprotocol.io

Example Queries

Here are some example queries you can try with an MCP client:

  1. "Find episodes about Python programming from 2023"
  2. "Show me all episodes by Ken Fallon"
  3. "Search transcripts for discussions about encryption"
  4. "What is the Database 101 series about?"
  5. "Show me recent episodes about Linux"
  6. "Find episodes tagged with 'security'"

Enjoy exploring the Hacker Public Radio knowledge base!

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured