Berlin Group MCP Server
Provides AI assistants with contextual access to Berlin Group Open Finance API specifications, enabling specification-compliant guidance through semantic search, graph database queries, and document retrieval.
README
Berlin Group MCP Server
A Model Context Protocol (MCP) server that provides Berlin Group Open Finance API specifications as contextual information to AI assistants in VS Code and IntelliJ IDEA.
Overview
This MCP server loads and indexes Berlin Group Open Finance specifications from OpenAPI YAML files and PDF documentation, enabling LLMs to provide accurate, specification-compliant guidance during Open Finance Framework implementation.
The server features advanced AI-powered capabilities including:
- Semantic Search via ChromaDB for intelligent, context-aware document retrieval
- Graph Database via Neo4j for exploring complex relationships between API endpoints, schemas, and data models
- Vector Embeddings for natural language queries across PDF documentation
- Relationship Traversal for understanding dependencies and schema inheritance
Features
- π Complete Specification Access: Loads all Berlin Group OpenAPI specs (AIS, PIS, PIIS, BASK, Consent, etc.)
- π Powerful Search: Search across endpoints, schemas, and PDF documentation
- π― Smart Filtering: Filter endpoints by method, tag, or specification
- π PDF Support: Extract and search content from implementation guides
- π§ Semantic Search: AI-powered semantic search using ChromaDB vector embeddings for natural language queries
- πΈοΈ Graph Database: Neo4j integration for exploring complex relationships and dependencies
- π Relationship Traversal: Navigate through schema references, endpoint dependencies, and API interconnections
- βοΈ Intelligent Text Chunking: Splits PDF documents into semantically meaningful chunks for better retrieval
- π Automatic Fallback: Gracefully falls back to in-memory storage when ChromaDB or Neo4j are unavailable
- π οΈ 24 MCP Tools: Comprehensive toolset including 12 core tools + 6 semantic search tools + 6 graph database tools
- π Multi-IDE Support: Works in VS Code and IntelliJ IDEA
Available Specifications
The server indexes the following Berlin Group specifications:
- Account Information Services (AIS) v2.3
- Payment Initiation Services (PIS) v2.3
- Confirmation of Funds (PIIS) v2.3
- Bank Account Status Services (BASK) v2.2
- Consent Management v2.1
- Data Dictionary v2.3.1
- Payment Update Status Hub (PUSH) v2.2
Installation
Prerequisites
- Node.js v18 or higher
- npm or yarn
- VS Code or IntelliJ IDEA with MCP support
- Optional: ChromaDB server for semantic search (runs on localhost:8000 by default)
- Optional: Neo4j database for graph queries (runs on localhost:7687 by default)
Setup
-
Clone or navigate to the project directory:
cd path-of-the-repo/Berlin-group-mcp -
Install dependencies:
npm install -
Build the project:
npm run build
Configuration
VS Code
-
The configuration file is already created at
.vscode/mcp-settings.json -
Update the path if needed to match your project location:
{ "mcpServers": { "berlin-group": { "command": "node", "args": [ "absolute-path-of-the-repo/Berlin-group-mcp/build/index.js" ] } } } -
Restart VS Code or reload the window
-
The Berlin Group tools should now be available in GitHub Copilot Chat
IntelliJ IDEA
See INTELLIJ_SETUP.md for detailed configuration instructions.
Dependencies
The project uses the following key dependencies:
Core Dependencies
- @modelcontextprotocol/sdk (^1.0.4): MCP protocol implementation
- js-yaml (^4.1.0): YAML parsing for OpenAPI specifications
- pdf-parse (^1.1.1): PDF document text extraction
Advanced Features
-
chromadb (^1.8.1): Vector database client for semantic search
- Enables AI-powered document retrieval
- Optional: Falls back to in-memory storage if unavailable
-
neo4j-driver (^5.27.0): Neo4j graph database driver
- Enables complex relationship queries
- Optional: Falls back to in-memory graph if unavailable
Development Dependencies
- typescript (^5.7.3): TypeScript compiler
- jest (^29.7.0): Testing framework
- ts-jest (^29.1.2): TypeScript support for Jest
- Type definitions for all major dependencies
All dependencies are automatically installed with npm install.
Optional: External Database Configuration
The Berlin Group MCP Server can optionally use external databases for enhanced capabilities. Both are completely optional β the server works perfectly without them using in-memory storage.
Configuration Methods
The server supports two methods for configuration:
- Environment Variables (Recommended): Create a
.envfile in the project root - Direct Configuration: Modify the configuration in
src/index.ts
Environment Variables
Copy the .env.example file to .env and customize:
cp .env.example .env
Then edit .env with your settings:
# ChromaDB Configuration (for Semantic Search)
CHROMA_HOST=localhost
CHROMA_PORT=8000
CHROMA_COLLECTION=berlin_group_pdfs
# OpenAI Configuration (for embeddings)
OPENAI_API_KEY=your_openai_api_key_here
OPENAI_EMBEDDING_MODEL=text-embedding-3-small
# Neo4j Configuration (for Graph Database)
NEO4J_URI=bolt://localhost:7687
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=password
NEO4J_DATABASE=neo4j
NEO4J_MAX_POOL_SIZE=50
NEO4J_CONNECTION_TIMEOUT=60000
Direct Configuration
Alternatively, you can modify the configuration directly in src/index.ts:
const indexer = new SpecificationIndexer({
vectorStore: {
chromaHost: 'localhost',
chromaPort: 8000,
collectionName: 'my_collection',
embeddingModel: 'text-embedding-3-small'
},
graphStore: {
uri: 'bolt://localhost:7687',
username: 'neo4j',
password: 'password',
database: 'neo4j',
maxConnectionPoolSize: 50,
connectionAcquisitionTimeout: 60000
}
});
ChromaDB (for Semantic Search)
ChromaDB enables AI-powered semantic search across PDF documentation using vector embeddings.
Installation:
# Using pip
pip install chromadb
# Or using Docker
docker run -d -p 8000:8000 chromadb/chroma
Default Configuration:
- Host:
localhost - Port:
8000 - Collection:
berlin_group_pdfs
The server automatically connects during initialization. If ChromaDB is unavailable, semantic search falls back to keyword matching.
Neo4j (for Graph Database)
Neo4j enables complex relationship queries and graph traversal across specifications, endpoints, and schemas.
Installation:
# Using Docker (recommended)
docker run -d \
-p 7475:7474 -p 7688:7687 \
-e NEO4J_AUTH=neo4j/password \
--name neo4j_berling_group_mcp \
neo4j:latest
# Or download from https://neo4j.com/download/
Default Configuration:
- URI:
bolt://localhost:7687 - Username:
neo4j - Password:
password - Database:
neo4j
The server automatically connects during initialization. If Neo4j is unavailable, graph queries use in-memory implementation.
Neo4j Browser Access:
Once Neo4j is running, access the browser interface at http://localhost:7474 to visualize the graph:
// Example queries in Neo4j Browser
MATCH (s:Specification)-[:DEFINES_ENDPOINT]->(e:Endpoint)
RETURN s, e LIMIT 25
MATCH (e:Endpoint)-[:USES_SCHEMA]->(s:Schema) WHERE e.path CONTAINS 'payment' RETURN e, s
MATCH path = (s1:Schema)-[:REFERENCES*1..3]->(s2:Schema) WHERE s1.name = 'PaymentInitiation' RETURN path
### Embedding Providers
For production deployments with ChromaDB, consider using advanced embedding providers:
**OpenAI Embeddings** (highest quality):
```typescript
// Set environment variable
export OPENAI_API_KEY="your-api-key"
// Modify vectorStore.ts to use OpenAIEmbeddingProvider
const embeddingProvider = new OpenAIEmbeddingProvider(
process.env.OPENAI_API_KEY,
'text-embedding-3-small' // or 'text-embedding-3-large'
);
Local Embeddings (default, no API required): The server includes a built-in TF-IDF-based embedding provider that works without external APIs. It's automatically used when no other provider is configured.
Deployment Scenarios
| Scenario | ChromaDB | Neo4j | Tools Available | Best For |
|---|---|---|---|---|
| Full Stack | β Running | β Running | 24 tools | Production, research, complex analysis |
| Semantic Focus | β Running | β Not available | 18 tools | Documentation search, Q&A |
| Graph Focus | β Not available | β Running | 18 tools | API architecture analysis |
| Minimal/Dev | β Not available | β Not available | 12 tools | Development, basic queries |
Architecture
Core Components
The Berlin Group MCP Server is built with a modular architecture consisting of several specialized components:
1. YAML Parser (yamlParser.ts)
Parses Berlin Group OpenAPI specifications from YAML files, extracting:
- API endpoints (paths, methods, parameters)
- Schema definitions and data models
- Tags, descriptions, and metadata
- Request/response specifications
2. PDF Parser (pdfParser.ts)
Processes PDF documentation files using pdf-parse library:
- Extracts full text content from PDF documents
- Performs keyword-based text search
- Provides document summaries and metadata
3. Text Chunker (textChunker.ts)
Implements intelligent document segmentation for vector embedding:
- Recursive Character Splitting: Breaks text at natural boundaries (paragraphs, sentences, clauses)
- Configurable Chunk Size: Default 1000 characters with 200 character overlap for context continuity
- Metadata Preservation: Tracks source file, chunk index, section headers, and page estimates
- Semantic Coherence: Maintains meaning by avoiding splits mid-sentence when possible
4. Vector Store (vectorStore.ts)
Manages semantic search capabilities using ChromaDB:
- ChromaDB Integration: Optional connection to ChromaDB server for persistent vector storage
- Local Embedding Provider: Built-in TF-IDF-like embedding generation when external APIs are unavailable
- OpenAI Embedding Support: Configurable integration with OpenAI's embedding models (text-embedding-3-small, text-embedding-3-large)
- Automatic Fallback: Uses in-memory vector storage when ChromaDB server is unavailable
- Semantic Search: Natural language queries with relevance scoring and distance metrics
- Metadata Filtering: Search within specific files or document sections
How Vector Store Works:
- PDF documents are split into chunks by the Text Chunker
- Each chunk is converted to a vector embedding (384-3072 dimensions depending on provider)
- Embeddings are stored in ChromaDB collection or in-memory fallback
- User queries are embedded using the same model
- Cosine similarity finds the most relevant chunks
- Results are ranked by relevance score (0.0 to 1.0)
5. Graph Store (graphStore.ts)
Manages graph database operations with Neo4j:
- Neo4j Integration: Optional connection to Neo4j database for complex relationship queries
- In-Memory Fallback: Complete graph implementation when Neo4j is unavailable
- Connection Management: Handles driver lifecycle, sessions, and transactions
- CRUD Operations: Create/read nodes and relationships with typed interfaces
- Cypher Query Execution: Direct access to Neo4j's powerful query language
- Statistics: Provides metrics on node counts, relationships, and graph density
Graph Node Types:
- Specification: OpenAPI spec metadata (title, version, description)
- Endpoint: API paths with HTTP methods
- Schema: Data models and type definitions
- Property: Schema fields with types and constraints
- Parameter: Request parameters (query, header, path, cookie)
- Response: HTTP response definitions with status codes
- Tag: Endpoint categorization
Graph Relationship Types:
DEFINES_ENDPOINT: Specification β EndpointDEFINES_SCHEMA: Specification β SchemaHAS_PARAMETER: Endpoint β ParameterHAS_RESPONSE: Endpoint β ResponseUSES_SCHEMA: Endpoint/Parameter/Response β SchemaREFERENCES: Schema β Schema (for $ref relationships)HAS_PROPERTY: Schema β PropertyTAGGED_WITH: Endpoint β Tag
6. Graph Indexer (graphIndexer.ts)
Transforms OpenAPI specifications into graph structures:
- Specification Indexing: Creates nodes for each loaded specification file
- Endpoint Extraction: Parses all API endpoints with full details
- Schema Mapping: Extracts all data models and their properties
- Relationship Building: Connects endpoints to schemas, parameters, and responses
- Reference Resolution: Follows
$refpointers to build schema dependency graphs - Progress Tracking: Provides real-time feedback during indexing operations
- Error Handling: Gracefully handles malformed specifications
Indexing Process:
- Load YAML files from
yml_files/directory - Create Specification nodes for each file
- Extract and create Endpoint nodes
- Extract and create Schema nodes with properties
- Build relationships between all entities
- Index into Neo4j or in-memory store
7. Graph Models (graphModels.ts)
Defines TypeScript interfaces for type-safe graph operations:
- Node interfaces (SpecificationNode, EndpointNode, SchemaNode, etc.)
- Relationship type enums
- Query result types (GraphTraversalResult, PatternSearchResult, etc.)
- DTO types for creating nodes
- Utility functions for ID generation and reference extraction
8. Specification Indexer (indexer.ts)
Orchestrates all components and provides unified API:
- Coordinates YAML Parser, PDF Parser, Vector Store, and Graph Indexer
- Manages initialization sequence and error handling
- Provides high-level search and query methods
- Handles fallback scenarios when optional services are unavailable
- Aggregates statistics across all subsystems
Data Flow
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β YAML Files βββββ>β YAML Parser βββββ>β Graph Indexer β
β (OpenAPI) β β β β β
βββββββββββββββββββ ββββββββββββββββββββ ββββββββββ¬βββββββββ
β
v
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β PDF Files βββββ>β PDF Parser + βββββ>β Vector Store β
β (Documentation) β β Text Chunker β β (ChromaDB) β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β
v
ββββββββββββββββββββββββββββ
β Specification Indexer β
β (Unified Interface) β
ββββββββββββββ¬ββββββββββββββ
β
v
ββββββββββββββββββββββββββββ
β MCP Server Tools β
β (24 Tools Available) β
ββββββββββββββββββββββββββββ
Fallback Mechanisms
The server is designed to work in various deployment scenarios:
-
Full Stack (ChromaDB + Neo4j):
- All 24 tools available
- Best performance and capabilities
- Semantic search with persistent vectors
- Complex graph queries with Cypher
-
Vector Only (ChromaDB, no Neo4j):
- 18 tools available (core + semantic search)
- Graph queries use in-memory implementation
- Good for semantic search focused use cases
-
Graph Only (Neo4j, no ChromaDB):
- 18 tools available (core + graph database)
- Semantic search falls back to keyword matching
- Good for relationship exploration use cases
-
Minimal (no external databases):
- 12 core tools available
- All operations use in-memory storage
- Keyword-based search only
- Suitable for basic queries and development
The server automatically detects available services during initialization and adjusts its capabilities accordingly. Users are informed which features are available through statistics and status endpoints.
Available Tools
The server provides 24 MCP tools organized into three categories:
Core Tools (12 tools)
Search and Discovery
-
search_endpoints- Search for API endpoints across all specificationsExample: "Find all payment endpoints" -
search_schemas- Search for data schemas and modelsExample: "Find schemas related to transaction" -
search_pdf_documentation- Search through PDF documentation using keyword matchingExample: "Search for SCA requirements" -
search_all- Comprehensive keyword search across all sources (endpoints, schemas, PDFs)Example: "Find everything about consent"
Endpoint Information
-
get_endpoint_details- Get detailed information about a specific endpointParameters: path, method Example: path="/v1/accounts", method="GET" -
filter_endpoints_by_tag- Filter endpoints by tagExample: tag="accounts" -
filter_endpoints_by_method- Filter endpoints by HTTP methodExample: method="POST"
Schema Information
get_schema- Get a specific schema definitionParameters: schemaName, specFile (optional) Example: schemaName="AccountDetails"
Specification Management
-
list_specifications- List all available OpenAPI specifications -
get_specification_details- Get comprehensive details about a specific specParameters: fileName -
list_pdf_documents- List all available PDF documentation -
get_statistics- Get basic statistics about loaded specifications
Semantic Search Tools (6 tools)
These tools use ChromaDB and vector embeddings for intelligent, context-aware document retrieval. When ChromaDB is unavailable, they automatically fall back to keyword-based search.
-
search_pdf_semantic- Perform semantic search across PDF documentationParameters: query (string), topK (number, default: 10) Example: "What are the authentication requirements for payment initiation?" How it works: - Converts your natural language query into a vector embedding - Finds the most semantically similar document chunks - Returns results ranked by relevance score (0.0-1.0) - Understands synonyms and related concepts (e.g., "authenticate" matches "authorization") -
search_pdf_semantic_filtered- Semantic search with metadata filtersParameters: query (string), fileName (optional), section (optional), topK (optional) Example: query="SCA exemptions", fileName="Implementation_Guide.pdf" Use cases: - Search within a specific document - Filter by document section - Narrow results to relevant portions -
search_all_semantic- Comprehensive semantic search across all sourcesParameters: query (string), topK (number, default: 10) Example: "How do I handle declined payments?" Returns: - Matching endpoints (keyword search) - Matching schemas (keyword search) - Semantically similar PDF content (vector search) -
get_vector_store_stats- Get vector store statisticsReturns: - enabled: Whether vector store is operational - totalChunks: Number of indexed document chunks - collectionName: ChromaDB collection name - isInMemory: Whether using in-memory fallback
Graph Database Tools (6 tools)
These tools use Neo4j for exploring complex relationships between specifications, endpoints, schemas, and data models. When Neo4j is unavailable, they use an in-memory graph implementation.
-
graph_find_related_schemas- Find schemas related through $ref referencesParameters: schemaName (string), specFile (optional), maxDepth (number, default: 3) Example: schemaName="AccountReference" Use cases: - Understand schema inheritance hierarchies - Find all schemas that reference a particular type - Discover composed data models - Map schema dependencies -
graph_get_endpoint_dependencies- Get all dependencies of an API endpointParameters: path (string), method (string), specFile (optional) Example: path="/v1/payments/sepa-credit-transfers", method="POST" Returns: - All request parameters (query, header, path, body) - Request body schema and nested schemas - All possible response codes and their schemas - Complete dependency tree -
graph_traverse_relationships- Execute custom graph traversal with filtersParameters: - startNodeType: Type of starting node (Specification, Endpoint, Schema, etc.) - startNodeFilter: Property filters (e.g., {name: "AccountReference"}) - relationshipTypes: Optional list of relationship types to follow - maxDepth: Maximum traversal depth (default: 3) Example: startNodeType="Schema", startNodeFilter={name: "PaymentInitiation*"}, relationshipTypes=["REFERENCES", "USES_SCHEMA"] Use cases: - Custom relationship exploration - Multi-hop dependency analysis - Pattern-based graph queries -
graph_get_specification_graph- Get complete graph for a specificationParameters: fileName (string) Example: fileName="BG_oFA_PIS_Version_2.3_20251128.openapi.yaml" Returns: - All endpoints in the specification - All schemas and their properties - All relationships between entities - Complete specification structure as a graph -
graph_search_by_pattern- Search graph nodes by property patternsParameters: - nodeType: Type of node to search - pattern: Property pattern with wildcards (e.g., {path: "/v1/accounts*"}) - limit: Maximum results (default: 50) Example: nodeType="Endpoint", pattern={path: "/v1/payments/*", method: "POST"} Supports wildcards: - {name: "*Account*"} - Contains "Account" - {path: "/v1/accounts*"} - Starts with "/v1/accounts" - {method: "POST"} - Exact match -
get_graph_store_stats- Get graph database statisticsReturns: - enabled: Whether graph store is operational - usingNeo4j: Whether connected to Neo4j (true) or using in-memory (false) - Node counts by type (Specification, Endpoint, Schema, etc.) - Relationship counts by type - Indexing metrics (duration, errors) - Graph density metrics
Tool Selection Guide
Use Core Tools when:
- You need exact endpoint paths or schema names
- You want to filter by tags or HTTP methods
- You're looking for specific specification details
Use Semantic Search Tools when:
- You have natural language questions
- You're exploring concepts across documentation
- You don't know the exact terminology
- You want AI-powered relevance ranking
Use Graph Database Tools when:
- You need to understand relationships and dependencies
- You're exploring schema inheritance
- You want to analyze endpoint complexity
- You need to traverse multi-level references
Usage Examples
In VS Code with GitHub Copilot
Basic Queries
You: "What endpoints are available for account information?"
Copilot: [Uses search_endpoints tool to find AIS endpoints]
You: "Show me the schema for payment initiation request"
Copilot: [Uses search_schemas tool to find payment schemas]
Semantic Search Queries
You: "How do I implement Strong Customer Authentication?"
Copilot: [Uses search_pdf_semantic to find relevant SCA documentation with AI ranking]
You: "What are the requirements for payment authorization?"
Copilot: [Uses search_all_semantic to find endpoints, schemas, and semantically related PDF content]
You: "Find information about transaction status in the PIS specification"
Copilot: [Uses search_pdf_semantic_filtered with fileName filter]
Graph Database Queries
You: "What schemas does AccountReference depend on?"
Copilot: [Uses graph_find_related_schemas to traverse schema relationships]
You: "Show me all dependencies for the payment initiation endpoint"
Copilot: [Uses graph_get_endpoint_dependencies to get parameters, request/response schemas]
You: "Find all endpoints that use the Amount schema"
Copilot: [Uses graph_traverse_relationships starting from Amount schema]
You: "Get the complete API structure for the AIS specification"
Copilot: [Uses graph_get_specification_graph to return full specification graph]
Advanced Analysis
You: "Compare the complexity of payment endpoints vs account endpoints"
Copilot: [Uses graph_get_endpoint_dependencies for multiple endpoints and compares]
You: "What are all the possible error responses for account endpoints?"
Copilot: [Uses graph_traverse_relationships to find all response schemas]
You: "Show me all schemas that contain PII (personally identifiable information)"
Copilot: [Uses search_pdf_semantic to find PII references, then graph_search_by_pattern to find related schemas]
Programmatic Usage
The server can also be used programmatically via the MCP protocol:
// Example tool call
{
"method": "tools/call",
"params": {
"name": "search_endpoints",
"arguments": {
"query": "payment"
}
}
}
Project Structure
Berlin-group-mcp/
βββ src/
β βββ index.ts # Main MCP server with 24 tool definitions
β βββ indexer.ts # Specification indexer orchestrating all components
β βββ yamlParser.ts # OpenAPI YAML parser
β βββ pdfParser.ts # PDF document parser
β βββ textChunker.ts # Intelligent text chunking for vector embeddings
β βββ vectorStore.ts # ChromaDB integration for semantic search
β βββ graphStore.ts # Neo4j integration and in-memory graph store
β βββ graphIndexer.ts # Graph database indexer
β βββ graphModels.ts # TypeScript interfaces for graph entities
βββ yml_files/ # Berlin Group OpenAPI specs (7 specifications)
β βββ BG_oFA_AIS_Version_2.3_20250818.openapi.yaml
β βββ BG_oFA_PIS_Version_2.3_20251128.openapi.yaml
β βββ BG_oFA_PIIS_Version_2.3_20250818.openapi.yaml
β βββ BG_oFA_BASK_Version_2.2_20251128.openapi.yaml
β βββ BG_oFA_Consent_Version_2.1_20251128.openapi.yaml
β βββ BG_oFA_dataDictionary_Version_2.2.6_20250818.openapi.yaml
β βββ BG_oFA_PUSH_Version_2.2_20250818.openapi.yaml
βββ pdf_files/ # PDF documentation (implementation guides, frameworks)
βββ tests/
β βββ unit/ # Unit tests for individual components
β β βββ vectorStore.test.ts
β β βββ graphStore.test.ts
β β βββ graphIndexer.test.ts
β β βββ graphModels.test.ts
β β βββ textChunker.test.ts
β βββ integration/ # Integration tests
β βββ semanticSearch.test.ts
β βββ graphSearch.test.ts
βββ build/ # Compiled JavaScript (generated)
βββ docs/ # Architecture documentation and diagrams
β βββ architecture/
β βββ diagrams/ # PlantUML diagrams for system architecture
βββ postman/ # Postman collection for testing MCP tools
βββ package.json # Dependencies: chromadb, neo4j-driver, pdf-parse, etc.
βββ tsconfig.json
βββ jest.config.js # Test configuration
βββ .vscode/
β βββ mcp-settings.json # VS Code MCP configuration
βββ INTELLIJ_SETUP.md # IntelliJ configuration guide
βββ README.md
Development
Running Tests
The project includes comprehensive unit and integration tests:
# Run all tests
npm test
# Run tests in watch mode
npm run test:watch
# Run with coverage report
npm run test:coverage
Test Coverage:
- Unit Tests:
vectorStore.test.ts,graphStore.test.ts,graphIndexer.test.ts,graphModels.test.ts,textChunker.test.ts - Integration Tests:
semanticSearch.test.ts,graphSearch.test.ts
Watch Mode
To automatically rebuild on file changes:
npm run watch
Debugging
To debug the server with Node.js inspector:
npm run inspector
Adding New Specifications
- Add YAML files to
yml_files/directory - Add PDF files to
pdf_files/directory - Rebuild the project:
npm run build - Restart the MCP server (reload VS Code or restart IDE)
- New specifications will be automatically indexed on next startup
Extending the Server
Adding a New Tool:
- Define tool schema in
src/index.tsTOOLS array - Add handler in
CallToolRequestSchemahandler - Implement business logic in
src/indexer.ts - Update README documentation
Adding a New Embedding Provider:
- Implement
EmbeddingProviderinterface insrc/vectorStore.ts - Add
embed()andembedQuery()methods - Configure in
src/indexer.tsor via environment variables
Customizing Graph Schema:
- Add new node types in
src/graphModels.ts - Add relationships in
RelationshipTypeenum - Update indexing logic in
src/graphIndexer.ts - Add query methods in
src/graphStore.ts
Troubleshooting
Server Not Starting
- Check Node.js version:
node --version(should be v18+) - Verify build completed:
ls -la build/(should see .js files) - Check for errors: Look in VS Code Developer Tools console (Help β Toggle Developer Tools)
- Rebuild:
npm run build
Tools Not Appearing
- Ensure MCP settings file exists: Check
.vscode/mcp-settings.json - Verify correct paths: Ensure the path to
build/index.jsis absolute and correct - Restart VS Code completely: Close all windows and reopen
- Check GitHub Copilot: Ensure Copilot is enabled and working
- Check console logs: Open Developer Tools and look for MCP connection errors
No Results from Search
- Verify YAML and PDF files exist:
ls -la yml_files/ pdf_files/ - Check server logs: Look for initialization errors in console
- Ensure files are readable: Check file permissions
- Try reindexing: Delete and rebuild:
rm -rf build && npm run build
Semantic Search Not Working
- Check if ChromaDB is running (optional):
curl http://localhost:8000/api/v1/heartbeat - Review initialization logs: Should see "Indexed X PDF chunks in vector store"
- Check fallback mode: Server will fall back to keyword search if ChromaDB unavailable
- Verify vector store stats: Use
get_vector_store_statstool - Check ChromaDB logs (if running via Docker):
docker logs <chromadb-container-id>
Graph Database Not Working
- Check if Neo4j is running (optional):
curl http://localhost:7474 # Or check Docker: docker ps | grep neo4j - Verify credentials: Default is
neo4j/neo4j(change on first login) - Review initialization logs: Should see "Graph indexing complete: X specs, Y endpoints, Z schemas"
- Check fallback mode: Server will use in-memory graph if Neo4j unavailable
- Verify graph store stats: Use
get_graph_store_statstool - Test Neo4j connection:
# Using cypher-shell cypher-shell -u neo4j -p your-password
Performance Issues
- Large PDF files: Consider splitting into smaller documents
- ChromaDB slow:
- Use local deployment instead of remote
- Reduce
topKparameter in semantic searches - Consider faster embedding provider
- Neo4j slow:
- Check if indexes are created
- Reduce
maxDepthin graph traversals - Optimize Cypher queries
- Memory usage high:
- Use external databases (ChromaDB + Neo4j) instead of in-memory
- Reduce number of specifications loaded
Connection Errors
ChromaDB Connection Refused:
Error: connect ECONNREFUSED 127.0.0.1:8000
Solution: ChromaDB is not running or running on different port. Server will automatically fall back to in-memory mode.
Neo4j Connection Failed:
Neo4jError: Could not connect to bolt://localhost:7687
Solution: Neo4j is not running or wrong credentials. Server will automatically fall back to in-memory mode.
Permission Issues
# If index.js is not executable
chmod +x build/index.js
# If YAML/PDF files are not readable
chmod -R 644 yml_files/*.yaml pdf_files/*.pdf
Debugging Tips
- Enable verbose logging: Set
NODE_ENV=developmentbefore starting server - Check initialization sequence: Server logs show each phase
- Test individual components:
npm test -- vectorStore.test.ts npm test -- graphStore.test.ts - Verify tool availability: Use
get_statistics,get_vector_store_stats,get_graph_store_statstools - Check MCP communication: Look for JSON-RPC messages in developer console
Common Error Messages
| Error | Cause | Solution |
|---|---|---|
| "Specifications not yet loaded" | Server still initializing | Wait 5-10 seconds and retry |
| "Semantic search is not available" | ChromaDB not connected | Normal, falls back to keyword search |
| "Graph store is not available" | Neo4j not connected | Normal, falls back to in-memory |
| "Collection not found" | ChromaDB collection missing | Server creates it automatically on startup |
| "Authentication failed" | Wrong Neo4j credentials | Update credentials in code or use default |
Technical Details
MCP Protocol
This server implements the Model Context Protocol specification (2025-11-25):
- Tools: 24 tools organized into core, semantic search, and graph database categories
- Resources: Direct access to specification files via
berlin-group://URI scheme - Transport: stdio-based communication for IDE integration
Component Architecture
Parser Features
-
YAML Parser:
- Extracts paths, operations, schemas, components from OpenAPI 3.0+ specs
- Handles
$refpointer resolution - Validates specification structure
- Indexes tags, parameters, and responses
-
PDF Parser:
- Uses
pdf-parselibrary for text extraction - Preserves document structure and metadata
- Enables full-text keyword search
- Provides page number estimation
- Uses
-
Text Chunker:
- Recursive character splitting algorithm
- Configurable chunk size (default: 1000 chars) and overlap (default: 200 chars)
- Maintains semantic coherence across chunks
- Preserves metadata (file name, section, page number)
Vector Store Implementation
-
ChromaDB Integration:
- HTTP client connection to ChromaDB server
- Collection-based document organization
- Metadata filtering support
- Cosine similarity for relevance scoring
-
Embedding Providers:
- LocalEmbeddingProvider: TF-IDF-based, 384 dimensions, no external dependencies
- OpenAIEmbeddingProvider: GPT-based, 1536 or 3072 dimensions, requires API key
- Pluggable architecture for custom providers
-
Search Algorithms:
- Query embedding generation
- K-nearest neighbors (KNN) search
- Distance metrics (cosine similarity, L2 distance)
- Relevance score normalization (0.0 to 1.0)
Graph Store Implementation
-
Neo4j Integration:
- Bolt protocol driver (neo4j-driver v5.27.0)
- Connection pooling for performance
- Transaction management
- Cypher query execution
-
Graph Schema:
Nodes: Specification, Endpoint, Schema, Property, Parameter, Response, Tag Relationships: DEFINES_ENDPOINT, DEFINES_SCHEMA, HAS_PARAMETER, HAS_RESPONSE, USES_SCHEMA, REFERENCES, HAS_PROPERTY, TAGGED_WITH -
In-Memory Fallback:
- Complete graph implementation using Maps
- Same API as Neo4j implementation
- Supports all query patterns
- Suitable for development and testing
Indexing Process
-
Initialization (parallel):
- Load YAML files β Parse specifications β Extract endpoints/schemas
- Load PDF files β Parse documents β Chunk text β Generate embeddings
-
Vector Store Indexing:
- Chunk all PDF documents (typical: 200-500 chunks per document)
- Generate embeddings for each chunk
- Store in ChromaDB with metadata
- Build search index
-
Graph Store Indexing:
- Create Specification nodes
- Create Endpoint nodes with relationships
- Create Schema nodes with properties
- Create Parameter and Response nodes
- Build REFERENCES relationships for $ref pointers
- Create Tag nodes and relationships
-
Error Handling:
- Graceful degradation if databases unavailable
- Detailed logging of indexing progress
- Error collection without stopping process
- Fallback to in-memory storage
Performance
-
Initial Load Time:
- YAML parsing: ~500ms (7 specifications)
- PDF parsing: ~1-2s (depends on file count/size)
- Vector indexing: ~2-5s (depends on chunk count and embedding provider)
- Graph indexing: ~1-3s (depends on database connection)
- Total: ~5-10 seconds for full initialization
-
Query Performance:
- Keyword search: <10ms (in-memory search)
- Semantic search: 50-200ms (depends on ChromaDB response time and top-k)
- Graph queries: 10-100ms (simple queries), 100-500ms (complex traversals)
- In-memory fallback: <50ms for most operations
-
Memory Usage:
- Base (specifications): ~20-30MB
- Vector store (in-memory): +30-50MB
- Graph store (in-memory): +20-40MB
- Total: ~70-120MB (without external databases)
- With external databases: ~30-50MB (stores data externally)
-
Scalability:
- Can handle 100+ specifications
- Supports 1000+ PDF pages
- Graph queries scale with Neo4j (millions of nodes)
- Vector search scales with ChromaDB (millions of chunks)
Quick Reference
Tool Categories Summary
| Category | Count | Purpose | Requires |
|---|---|---|---|
| Core Tools | 12 | Basic search, filtering, specification access | None (built-in) |
| Semantic Search | 6 | AI-powered document retrieval, natural language queries | ChromaDB (optional) |
| Graph Database | 6 | Relationship exploration, dependency analysis | Neo4j (optional) |
| Total | 24 | Complete specification analysis toolkit | Node.js only |
Key Features Comparison
| Feature | Without Databases | With ChromaDB | With Neo4j | With Both |
|---|---|---|---|---|
| Endpoint Search | β Keyword | β Keyword | β Keyword | β Keyword |
| Schema Search | β Keyword | β Keyword | β Keyword | β Keyword |
| PDF Search | β Keyword | β Semantic + Keyword | β Keyword | β Semantic + Keyword |
| Schema Relationships | β In-memory | β In-memory | β Neo4j Graph | β Neo4j Graph |
| Endpoint Dependencies | β In-memory | β In-memory | β Neo4j Graph | β Neo4j Graph |
| Graph Traversal | β Limited | β Limited | β Full Cypher | β Full Cypher |
| Performance | Good | Excellent (PDF) | Excellent (Graph) | Excellent (Both) |
| Memory Usage | ~120MB | ~70MB | ~80MB | ~50MB |
Common Queries Cheat Sheet
// Find endpoints
"search for payment endpoints"
β Uses: search_endpoints
// Find schemas
"show me the AccountDetails schema"
β Uses: get_schema or search_schemas
// Natural language search (semantic)
"how to handle authentication errors?"
β Uses: search_pdf_semantic
// Find related schemas
"what schemas does PaymentInitiation reference?"
β Uses: graph_find_related_schemas
// Analyze endpoint
"what are all the parameters and responses for POST /v1/payments?"
β Uses: graph_get_endpoint_dependencies
// Explore relationships
"show me all schemas that use Address type"
β Uses: graph_traverse_relationships
// Get overview
"show me statistics about the loaded specifications"
β Uses: get_statistics, get_vector_store_stats, get_graph_store_stats
License
MIT
References
- Model Context Protocol - MCP specification and documentation
- Berlin Group Open Finance - Official Berlin Group website
- MCP SDK Documentation - TypeScript SDK for MCP
- ChromaDB Documentation - Vector database for AI applications
- Neo4j Documentation - Graph database platform
- OpenAPI Specification - API specification format
Contributing
Contributions are welcome! Please ensure:
- TypeScript code follows project conventions
- All tools have proper error handling
- Documentation is updated for new features
- Tests are added for new components (unit tests in
tests/unit/, integration tests intests/integration/) - New embedding providers implement the
EmbeddingProviderinterface - New graph node types are added to
graphModels.ts - README is updated with examples and usage instructions
Areas for Contribution
- Additional embedding providers (Cohere, HuggingFace, etc.)
- Enhanced graph query capabilities
- Additional Berlin Group specifications
- Performance optimizations
- Additional MCP tools
- Documentation improvements
Support
For issues or questions:
- Check the troubleshooting section
- Review MCP documentation
- Check Berlin Group specification documentation
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.