MCP Embedding Search
A Model Context Protocol server that searches transcript segments in a Turso database using vector similarity, allowing users to find relevant content by asking questions without generating new embeddings.
spences10
README
mcp-embedding-search
A Model Context Protocol (MCP) server that queries a Turso database containing embeddings and transcript segments. This tool allows users to search for relevant transcript segments by asking questions, without generating new embeddings.
Features
- 🔍 Vector similarity search for transcript segments
- 📊 Relevance scoring based on cosine similarity
- 📝 Complete transcript metadata (episode title, timestamps)
- ⚙️ Configurable search parameters (limit, minimum score)
- 🔄 Efficient database connection pooling
- 🛡️ Comprehensive error handling
- 📈 Performance optimized for quick responses
Configuration
This server requires configuration through your MCP client. Here are examples for different environments:
Cline Configuration
Add this to your Cline MCP settings:
{
"mcpServers": {
"mcp-embedding-search": {
"command": "node",
"args": ["/path/to/mcp-embedding-search/dist/index.js"],
"env": {
"TURSO_URL": "your-turso-database-url",
"TURSO_AUTH_TOKEN": "your-turso-auth-token"
}
}
}
}
Claude Desktop Configuration
Add this to your Claude Desktop configuration:
{
"mcpServers": {
"mcp-embedding-search": {
"command": "node",
"args": ["/path/to/mcp-embedding-search/dist/index.js"],
"env": {
"TURSO_URL": "your-turso-database-url",
"TURSO_AUTH_TOKEN": "your-turso-auth-token"
}
}
}
}
API
The server implements one MCP tool:
search_embeddings
Search for relevant transcript segments using vector similarity.
Parameters:
question(string, required): The query text to search forlimit(number, optional): Number of results to return (default: 5, max: 50)min_score(number, optional): Minimum similarity threshold (default: 0.5, range: 0-1)
Response format:
[
{
"episode_title": "Episode Title",
"segment_text": "Transcript segment content...",
"start_time": 123.45,
"end_time": 167.89,
"similarity": 0.85
}
// Additional results...
]
Database Schema
This tool expects a Turso database with the following schema:
CREATE TABLE embeddings (
id INTEGER PRIMARY KEY AUTOINCREMENT,
transcript_id INTEGER NOT NULL,
embedding TEXT NOT NULL,
FOREIGN KEY(transcript_id) REFERENCES transcripts(id)
);
CREATE TABLE transcripts (
id INTEGER PRIMARY KEY AUTOINCREMENT,
episode_title TEXT NOT NULL,
segment_text TEXT NOT NULL,
start_time REAL NOT NULL,
end_time REAL NOT NULL
);
The embedding column should contain vector embeddings that can be
used with the vector_distance_cos function.
Development
Setup
- Clone the repository
- Install dependencies:
npm install
- Build the project:
npm run build
- Run in development mode:
npm run dev
Publishing
The project uses changesets for version management. To publish:
- Create a changeset:
npm run changeset
- Version the package:
npm run version
- Publish to npm:
npm run release
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
MIT License - see the LICENSE file for details.
Acknowledgments
- Built on the Model Context Protocol
- Designed for efficient vector similarity search in transcript databases
Recommended Servers
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Claude Code MCP
An implementation of Claude Code as a Model Context Protocol server that enables using Claude's software engineering capabilities (code generation, editing, reviewing, and file operations) through the standardized MCP interface.
@kazuph/mcp-gmail-gas
Model Context Protocol server for Gmail integration. This allows Claude Desktop (or any MCP client) to interact with your Gmail account through Google Apps Script.
MCP PubMed Search
Server to search PubMed (PubMed is a free, online database that allows users to search for biomedical and life sciences literature). I have created on a day MCP came out but was on vacation, I saw someone post similar server in your DB, but figured to post mine.
MCP DuckDB Knowledge Graph Memory Server
A memory server for Claude that stores and retrieves knowledge graph data in DuckDB, enhancing performance and query capabilities for conversations with persistent user information.
dbt Semantic Layer MCP Server
A server that enables querying the dbt Semantic Layer through natural language conversations with Claude Desktop and other AI assistants, allowing users to discover metrics, create queries, analyze data, and visualize results.
Tavily MCP Server
Provides AI-powered web search capabilities using Tavily's search API, enabling LLMs to perform sophisticated web searches, get direct answers to questions, and search recent news articles.
mixpanel
Connect to your Mixpanel data. Query events, retention, and funnel data from Mixpanel analytics.