Bluetooth Specifications MCP Server
Enables natural language search over Bluetooth specification PDFs with hybrid search (BM25 + semantic embeddings), returning precise answers with citations.
README
Bluetooth Specifications MCP Server
RAG-powered search for Bluetooth specifications. Ask questions in natural language, get precise answers with citations.
⨠What It Does
Search Bluetooth spec PDFs using hybrid search (BM25 + semantic embeddings) through the Model Context Protocol. Works with Claude Code, Cline, and other MCP clients.
Example:
"How does Bluetooth pairing protect against MITM attacks?"
ā Returns relevant sections from Core Specification with page numbers and context.
š Quick Start
# 1. Install
python3 -m venv .venv && source .venv/bin/activate
pip install -e . # BM25-only (~100MB, 2 min)
# OR
pip install -e ".[semantic]" # Full features (~7.2GB, 15 min)
# 2. Add PDFs
mkdir specs && cp your-bluetooth-spec.pdf specs/
# 3. Generate cache (first run indexes PDFs)
btmcp-server
# Wait for "Ready! Indexed X chunks", then Ctrl+C
# 4. Configure your MCP client (see Configuration section below)
# Done! Your MCP client will launch the server automatically.
Why generate cache first? The first run indexes all PDFs which can take 1-5 minutes depending on PDF size. Running standalone first prevents MCP client timeouts.
āļø Legal Notice
Users must provide their own legally obtained PDF copies of Bluetooth specifications.
This MCP server provides tools to access Bluetooth specification documents. You are responsible for obtaining PDFs legally from official sources. Visit bluetooth.com/specifications for information about specifications.
š¦ Features
- Hybrid Search - BM25 keyword + semantic embeddings (RRF fusion)
- Smart Chunking - Section-based (keeps multi-page sections together)
- Metadata Extraction - Requirements, UUIDs, tables, figures
- 3 Search Modes - BM25 (exact), semantic (meaning), hybrid (both)
- Persistent Cache - Fast subsequent searches
- 4 MCP Tools - search, refresh, list, check_status
š Installation Modes
| Mode | Size | Time | Features | When to Use |
|---|---|---|---|---|
| BM25-only | ~100MB | 2 min | Keyword search | Quick configuration |
| Semantic | ~7.2GB | 15 min | Hybrid search | Best accuracy |
Note: You can start with BM25-only and upgrade later with pip install -e ".[semantic]"
š” Configuration
Claude Code
The easiest way to add the server:
claude mcp add --transport stdio bluetooth-specifications /absolute/path/to/mcp-bluetooth-specification/.venv/bin/btmcp-server
Or manually edit .mcp.json:
{
"mcpServers": {
"bt-specs": {
"command": "/absolute/path/to/.venv/bin/btmcp-server"
}
}
}
Cursor
Add to your MCP settings (File ā Preferences ā MCP):
{
"mcpServers": {
"bt-specs": {
"command": "/absolute/path/to/.venv/bin/btmcp-server"
}
}
}
Note: Replace /absolute/path/to/ with your actual project path. The MCP client will launch the server automatically when needed.
š„ļø Manual Server Mode
For testing or running as a long-running daemon:
source .venv/bin/activate
btmcp-server # stdio mode (default)
HTTP mode (advanced):
MCP_TRANSPORT=streamable-http btmcp-server # Runs on http://127.0.0.1:8000
Environment Variables
MCP_TRANSPORT-stdio(default) orstreamable-httpMCP_HOST- Host for HTTP mode (default:127.0.0.1)MCP_PORT- Port for HTTP mode (default:8000)BTMCP_CACHE_DIR- Custom cache directory (default:<project>/.cache/)
š MCP Tools
The server provides 4 tools for MCP clients:
search_specifications
Search Bluetooth specs with hybrid search.
Parameters:
query(string) - Your question or search termsmode(string) -"bm25","semantic", or"hybrid"(default)top_k(int) - Number of results (default: 3)filter_pdfs(string) - Filter by PDF names (comma-separated)
Example:
search_specifications(
query="How does LE Secure Connections work?",
mode="hybrid",
top_k=3
)
list_indexed_specs
List all indexed PDFs with statistics.
check_index_status
Check if cache is fresh or needs rebuilding.
refresh_index
Rebuild index from PDFs (use after adding/updating specs).
Testing the Server Manually
We provide three test scripts for manual testing:
1. Automated Test (test_live.py)
Runs a full suite of test queries and shows results:
source .venv/bin/activate
python test_live.py
This will:
- Load all PDFs from
specs/ - Run 5 test queries ("GATT Service", "L2CAP layer", "frequency", "routing", "security")
- Display results with scores and citations
- Show MCP formatted output
2. Interactive Test (test_interactive.py)
Search interactively with your own queries:
source .venv/bin/activate
python test_interactive.py
Then type your search queries. Type quit to exit.
Example queries:
ATT protocoldevice addresspairingLE Secure Connections
3. MCP Format Test (test_mcp_format.py)
See the exact MCP resource output format:
source .venv/bin/activate
python test_mcp_format.py
Enter a query and see the formatted output that MCP clients would receive.
Configuring for Claude Code
Create or add to your project's .mcp.json the following configuration:
{
"mcpServers": {
"bt-specs": {
"type": "http",
"url": "http://127.0.0.1:8000/mcp"
}
}
}
To test it, ask Claude Code: "Search the Bluetooth specs for GATT Service"
Using with MCP Clients
The server exposes four MCP tools:
Tool: search_specifications
- Parameters:
query(string) - Search terms (e.g., "GATT Service", "Device Information Service")mode(string, optional) - Search mode: "bm25" (keyword), "semantic" (meaning), or "hybrid" (default)top_k(int, optional) - Number of results to return (default: 3)filter_pdfs(string, optional) - Comma-separated list of PDF names to filter results (e.g., "doc1.pdf,doc2.pdf")
- Returns: Formatted results with text and citations
Tool: list_indexed_specs
- Parameters: None
- Returns: List of all indexed PDFs with statistics (pages, chunks)
Tool: check_index_status
- Parameters: None
- Returns: Cache freshness status and which PDFs need reindexing
Tool: refresh_index
- Parameters: None
- Returns: Status message after rebuilding index from PDFs
Choosing the Right Search Mode
The server supports three search modes optimized for different query types:
Mode: bm25 (Keyword Matching)
Best for: Known terminology, command names, protocol names
Query: "GATT service ATT protocol attributes"
Mode: bm25
Response: Top results focus on exact keyword matches:
- Result 1 (score: 25.851): "Attribute Protocol (ATT) block implements the
peer-to-peer protocol... The Generic Attribute Profile (GATT) block represents
the functionality of the ATT Server..." (Core_v6.1.pdf, Page 245)
- Result 2 (score: 24.041): SDP record details for GATT service with ATT
protocol descriptors and attribute handles (Core_v6.1.pdf, Page 1613)
Use when: You know the exact technical terms and want precise matches
Mode: semantic (Meaning-Based)
Best for: Natural questions, conceptual understanding, "how-to" queries
Query: "How does pairing work with secure connections"
Mode: semantic
Response: Conceptually related content about security and pairing:
- Result 1 (score: 0.706): "Secure Simple Pairing has two security goals:
protection against passive eavesdropping and MITM attacks... uses Elliptic
Curve Diffie Hellman (ECDH) public key cryptography" (Core_v6.1.pdf, Page 313)
- Result 2 (score: 0.645): "Man-in-the-middle (MITM) attack... Secure Simple
Pairing offers two user assisted numeric methods: numeric comparison or
passkey entry" (Core_v6.1.pdf, Page 314)
Use when: Asking how/why questions, exploring concepts, learning about features
Mode: hybrid (Best of Both - Default)
Best for: General queries, balanced precision/recall
Query: "LE Secure Connections pairing authentication"
Mode: hybrid (default)
Response: Combines exact matches with conceptual similarity:
- Result 1 (score: 0.016): Complete pairing flow diagram showing "Phase 1:
Established LL connection, Phase 2: Pairing over SMP (Legacy pairing or
Secure Connections), Phase 3: Establishment of encrypted connection"
(Core_v6.1.pdf, Page 1629)
- Result 2 (score: 0.016): Authentication procedure details including
commitment checks and failure handling (Core_v6.1.pdf, Page 701)
Uses RRF (Reciprocal Rank Fusion) to merge BM25 + semantic rankings.
Use when: Not sure which mode to use (recommended default)
Recommendation: Use hybrid (default) for 90% of queries. Only switch to
bm25 or semantic if hybrid results don't meet your needs.
Filtering by PDF
Search within specific specifications:
# Search only in Core Specification v6.1
search_specifications(
query="encryption key management",
filter_pdfs="Core_v6.1.pdf"
)
Response:
- Result 1 (score: 0.031): Complete table of contents for security chapter
showing "3 Key management... 3.2.5 Generating the encryption key...
4.2.5.8 Encryption key refresh" (Core_v6.1.pdf, Page 1016)
- Result 2 (score: 0.030): "Encryption key refresh... shall refresh the
encryption key within 2^28 ticks... procedures for pause and resume
encryption" (Core_v6.1.pdf, Page 693)
# Search across multiple specs (comma-separated)
search_specifications(
query="ATT protocol",
filter_pdfs="Core_v6.1.pdf,80211MP.TS_.p7.pdf"
)
# Partial names work too
search_specifications(
query="security",
filter_pdfs="Core_v6" # Matches Core_v6.1.pdf
)
Use cases:
- Focus on specific specification document
- Avoid mixing results from different spec versions
- Compare implementations across different documents
Semantic Chunking Benefits
Section-based chunking keeps related content together:
** With page chunking:**
Query: "Device Information Service GATT Service documentation"
ā Fragmented results:
- Chunk 1: Page 85 (partial introduction)
- Chunk 2: Page 86 (middle of command, no context)
- Chunk 3: Page 87 (table only, incomplete)
Problem: Context broken across page boundaries
With semantic chunking:
Query: "Device Information Service GATT Service documentation"
ā
Complete section:
- Chunk: Section 4.26 "Device Information Service GATT Service"
- Text: Pages 85-87 combined
- Introduction + Commands + Tables together
- Metadata: 6 requirements, 2 hex values, 3 tables
Result: Full context, nothing missing
What we do:
- Group multi-page sections (e.g., Section 4.2.1 spanning pages 1-3)
- Size constraints: 100-2000 tokens per chunk
- 50-token overlap between split chunks (preserves continuity)
- Merge small trailing chunks (avoid fragments < 100 tokens)
Impact: Better RAG quality - agents get complete semantic units, not arbitrary page cuts.
Tool Chaining Workflow
Agents can chain multiple tools for complex tasks:
User: "Check if the index is current, then find information about
encryption key management in the Core spec"
Agent workflow:
1. check_index_status()
ā Returns: "ā Cache is FRESH - all PDFs up to date
Total PDFs: 2
⢠80211MP.TS_.p7.pdf: OK
⢠Core_v6.1.pdf: OK"
2. search_specifications(
query="encryption key management",
filter_pdfs="Core_v6.1.pdf",
mode="hybrid",
top_k=2
)
ā Returns: Two chunks about key management and encryption key refresh
from Core Specification pages 1016 and 693
3. Agent synthesizes answer:
- Index is fresh (no rebuild needed)
- Found detailed security chapter with key management sections
- Includes encryption key refresh procedures and timing requirements
Available tool combinations:
check_index_statusārefresh_index(rebuild if stale)list_indexed_specsāsearch_specificationswithfilter_pdfssearch_specificationsā Extract metadata ā Follow-up search
Benefits:
- Stateful caching (no re-indexing between queries)
- Metadata-aware search (requirements, hex values, tables, figures)
- Compositional workflows across multiple tools
Real Claude Code Conversation
User: How does Bluetooth pairing protect against man-in-the-middle attacks?
Claude: I'll search the Bluetooth specifications for information about MITM
protection in pairing.
[Uses: search_specifications("How does pairing work with secure connections",
mode="semantic", top_k=2)]
Based on the Bluetooth Core Specification v6.1, here's how pairing protects
against man-in-the-middle (MITM) attacks:
1. Secure Simple Pairing Goals (Page 313):
- Protection against passive eavesdropping using Elliptic Curve Diffie
Hellman (ECDH) public key cryptography
- ECDH provides very high degree of strength against passive attacks
- For Secure Connections: uses P-256 elliptic curve with ~128 bits entropy
2. MITM Protection Methods (Page 314):
- Offers two user-assisted numeric methods: numeric comparison or passkey
entry
- Uses 6-digit numbers providing 1 in 1,000,000 chance of successful MITM
- If MITM attack fails, users are alerted when connection process fails
- This level was chosen for FIPS compliance with minimal usability impact
The key insight: While ECDH prevents passive eavesdropping, the numeric
comparison/passkey entry ensures users can verify they're connecting to the
intended device and not an attacker in the middle.
Would you like me to search for more details about the authentication
procedures?
Development
Running Tests
Run all tests:
source .venv/bin/activate
export PYTHONPATH="${PYTHONPATH}:$(pwd)/src"
pytest
Run specific test file:
pytest tests/test_indexer.py -v
Run with coverage:
pytest --cov=btmcp
Linting and Formatting
Check code:
ruff check .
Format code:
ruff format .
Auto-fix issues:
ruff check --fix .
Project Structure
bluetooth-specifications-mcp/
āāā src/btmcp/
ā āāā __init__.py
ā āāā server.py # MCP server with 4 tools
ā āāā spec_server.py # PDF loading, caching, search coordinator
ā āāā pdf_loader.py # pypdf extraction + NFKC normalization
ā āāā indexer.py # Semantic chunking + hybrid search
ā āāā models.py # Dataclasses (metadata structures)
ā āāā metadata_extractor.py # Regex parsers (requirements, hex, tables)
āāā tests/
ā āāā test_server.py # MCP server tests
ā āāā test_spec_server.py # Integration tests
ā āāā test_mcp_integration.py # MCP protocol compliance
ā āāā test_pdf_loader.py # PDF extraction + normalization
ā āāā test_indexer.py # Chunking + search (36 tests)
ā āāā test_metadata_extractor.py # Metadata parsing (15 tests)
ā āāā test_models.py # Dataclass validation (12 tests)
ā āāā test_error_handling.py # Exception propagation (2 tests)
āāā test_live.py # Automated test with real PDFs
āāā test_interactive.py # Interactive search testing
āāā test_mcp_format.py # MCP format output test
āāā specs/ # Place PDF files here
āāā .mcp.json # Claude Code configuration
āāā pyproject.toml
Architecture
Data Flow:
PDF Files (specs/)
ā PDFLoader.load_pdf() ā NFKC normalization
ā Semantic Chunking:
⢠Group multi-page sections
⢠Size constraints (100-2000 tokens, 50 overlap)
ā MetadataExtractor:
⢠Requirements (section references, MUST/SHOULD/MAY levels)
⢠Hex values (Service/Characteristic UUIDs like 0x180D)
⢠Tables & Figures
ā Indexer:
⢠BM25 tokenization
⢠Semantic embeddings (all-MiniLM-L6-v2)
ā Cache (pickle):
⢠BM25 index
⢠Embeddings
⢠Metadata
Query
ā MCP Tool (search_specifications)
ā Mode selection (bm25/semantic/hybrid)
ā Hybrid: RRF fusion (k=60)
ā PDF filtering (optional)
ā Top-K Results + Metadata
Components (7 modules):
- server.py: MCP server, 4 tools (search, refresh, list, check)
- spec_server.py: Coordinates PDF loading, caching, searching
- pdf_loader.py: pypdf extraction + NFKC Unicode normalization
- indexer.py: Semantic chunking, hybrid search (BM25 + embeddings)
- models.py: Dataclasses (RequirementInfo, TableInfo, FigureInfo, ChunkMetadata)
- metadata_extractor.py: Regex parsers for requirements, hex, tables, figures
Troubleshooting
Issue: "ModuleNotFoundError: No module named 'btmcp'"
Solution: Set PYTHONPATH:
export PYTHONPATH="${PYTHONPATH}:$(pwd)/src"
Issue: "Warning: specs directory not found"
Solution: Create specs directory and add PDFs:
mkdir specs
cp /path/to/your/spec.pdf specs/
Issue: Empty search results
Check:
- PDFs are loaded on startup (check console output)
- Search query matches content in PDFs
- Try broader search terms
Disclaimer
This is a personal learning project to explore MCP and other AI-related technologies. I'm doing it in my spare time. This project is heavily developed with Claude Code, what some would call "Vibe Coding". I provide zero warranties. Consider the project experimental.
License
BSD-3-Clause - see LICENSE.md
Third-Party Dependencies
| Dependency | License | Copyright | Source |
|---|---|---|---|
| mcp | MIT | Anthropic, PBC | https://modelcontextprotocol.io |
| pypdf | BSD-3-Clause | Mathieu Fenniak and contributors | https://github.com/py-pdf/pypdf |
| rank-bm25 | Apache 2.0 | Dorian Brown | https://github.com/dorianbrown/rank_bm25 |
| sentence-transformers | Apache 2.0 | Nils Reimers and contributors | https://www.SBERT.net |
| numpy | BSD-3-Clause | NumPy Developers | https://numpy.org |
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.