MCP Servers

Berlin Group MCP Server

Provides AI assistants with contextual access to Berlin Group Open Finance API specifications, enabling specification-compliant guidance through semantic search, graph database queries, and document retrieval.

README

Berlin Group MCP Server

A Model Context Protocol (MCP) server that provides Berlin Group Open Finance API specifications as contextual information to AI assistants in VS Code and IntelliJ IDEA.

Overview

This MCP server loads and indexes Berlin Group Open Finance specifications from OpenAPI YAML files and PDF documentation, enabling LLMs to provide accurate, specification-compliant guidance during Open Finance Framework implementation.

The server features advanced AI-powered capabilities including:

Semantic Search via ChromaDB for intelligent, context-aware document retrieval
Graph Database via Neo4j for exploring complex relationships between API endpoints, schemas, and data models
Vector Embeddings for natural language queries across PDF documentation
Relationship Traversal for understanding dependencies and schema inheritance

Features

📚 Complete Specification Access: Loads all Berlin Group OpenAPI specs (AIS, PIS, PIIS, BASK, Consent, etc.)
🔍 Powerful Search: Search across endpoints, schemas, and PDF documentation
🎯 Smart Filtering: Filter endpoints by method, tag, or specification
📖 PDF Support: Extract and search content from implementation guides
🧠 Semantic Search: AI-powered semantic search using ChromaDB vector embeddings for natural language queries
🕸️ Graph Database: Neo4j integration for exploring complex relationships and dependencies
📊 Relationship Traversal: Navigate through schema references, endpoint dependencies, and API interconnections
✂️ Intelligent Text Chunking: Splits PDF documents into semantically meaningful chunks for better retrieval
🔄 Automatic Fallback: Gracefully falls back to in-memory storage when ChromaDB or Neo4j are unavailable
🛠️ 24 MCP Tools: Comprehensive toolset including 12 core tools + 6 semantic search tools + 6 graph database tools
🔌 Multi-IDE Support: Works in VS Code and IntelliJ IDEA

Available Specifications

The server indexes the following Berlin Group specifications:

Account Information Services (AIS) v2.3
Payment Initiation Services (PIS) v2.3
Confirmation of Funds (PIIS) v2.3
Bank Account Status Services (BASK) v2.2
Consent Management v2.1
Data Dictionary v2.3.1
Payment Update Status Hub (PUSH) v2.2

Installation

Prerequisites

Node.js v18 or higher
npm or yarn
VS Code or IntelliJ IDEA with MCP support
Optional: ChromaDB server for semantic search (runs on localhost:8000 by default)
Optional: Neo4j database for graph queries (runs on localhost:7687 by default)

Setup

Clone or navigate to the project directory:
```
cd path-of-the-repo/Berlin-group-mcp
```
Install dependencies:
```
npm install
```
Build the project:
```
npm run build
```

Configuration

VS Code

The configuration file is already created at .vscode/mcp-settings.json

Update the path if needed to match your project location:

{
  "mcpServers": {
    "berlin-group": {
      "command": "node",
      "args": [
        "absolute-path-of-the-repo/Berlin-group-mcp/build/index.js"
      ]
    }
  }
}

Restart VS Code or reload the window
The Berlin Group tools should now be available in GitHub Copilot Chat

IntelliJ IDEA

See INTELLIJ_SETUP.md for detailed configuration instructions.

Dependencies

The project uses the following key dependencies:

Core Dependencies

@modelcontextprotocol/sdk (^1.0.4): MCP protocol implementation
js-yaml (^4.1.0): YAML parsing for OpenAPI specifications
pdf-parse (^1.1.1): PDF document text extraction

Advanced Features

chromadb (^1.8.1): Vector database client for semantic search
- Enables AI-powered document retrieval
- Optional: Falls back to in-memory storage if unavailable
neo4j-driver (^5.27.0): Neo4j graph database driver
- Enables complex relationship queries
- Optional: Falls back to in-memory graph if unavailable

Development Dependencies

typescript (^5.7.3): TypeScript compiler
jest (^29.7.0): Testing framework
ts-jest (^29.1.2): TypeScript support for Jest
Type definitions for all major dependencies

All dependencies are automatically installed with npm install.

Optional: External Database Configuration

The Berlin Group MCP Server can optionally use external databases for enhanced capabilities. Both are completely optional – the server works perfectly without them using in-memory storage.

Configuration Methods

The server supports two methods for configuration:

Environment Variables (Recommended): Create a .env file in the project root
Direct Configuration: Modify the configuration in src/index.ts

Environment Variables

Copy the .env.example file to .env and customize:

cp .env.example .env

Then edit .env with your settings:

# ChromaDB Configuration (for Semantic Search)
CHROMA_HOST=localhost
CHROMA_PORT=8000
CHROMA_COLLECTION=berlin_group_pdfs

# OpenAI Configuration (for embeddings)
OPENAI_API_KEY=your_openai_api_key_here
OPENAI_EMBEDDING_MODEL=text-embedding-3-small

# Neo4j Configuration (for Graph Database)
NEO4J_URI=bolt://localhost:7687
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=password
NEO4J_DATABASE=neo4j
NEO4J_MAX_POOL_SIZE=50
NEO4J_CONNECTION_TIMEOUT=60000

Direct Configuration

Alternatively, you can modify the configuration directly in src/index.ts:

const indexer = new SpecificationIndexer({
  vectorStore: {
    chromaHost: 'localhost',
    chromaPort: 8000,
    collectionName: 'my_collection',
    embeddingModel: 'text-embedding-3-small'
  },
  graphStore: {
    uri: 'bolt://localhost:7687',
    username: 'neo4j',
    password: 'password',
    database: 'neo4j',
    maxConnectionPoolSize: 50,
    connectionAcquisitionTimeout: 60000
  }
});

ChromaDB (for Semantic Search)

ChromaDB enables AI-powered semantic search across PDF documentation using vector embeddings.

Installation:

# Using pip
pip install chromadb

# Or using Docker
docker run -d -p 8000:8000 chromadb/chroma

Default Configuration:

Host: localhost
Port: 8000
Collection: berlin_group_pdfs

The server automatically connects during initialization. If ChromaDB is unavailable, semantic search falls back to keyword matching.

Neo4j (for Graph Database)

Neo4j enables complex relationship queries and graph traversal across specifications, endpoints, and schemas.

Installation:

# Using Docker (recommended)
docker run -d \
  -p 7475:7474 -p 7688:7687 \
  -e NEO4J_AUTH=neo4j/password \
  --name neo4j_berling_group_mcp \
  neo4j:latest

# Or download from https://neo4j.com/download/

Default Configuration:

URI: bolt://localhost:7687
Username: neo4j
Password: password
Database: neo4j

The server automatically connects during initialization. If Neo4j is unavailable, graph queries use in-memory implementation.

Neo4j Browser Access: Once Neo4j is running, access the browser interface at http://localhost:7474 to visualize the graph:

// Example queries in Neo4j Browser
MATCH (s:Specification)-[:DEFINES_ENDPOINT]->(e:Endpoint)
RETURN s, e LIMIT 25

MATCH (e:Endpoint)-[:USES_SCHEMA]->(s:Schema) WHERE e.path CONTAINS 'payment' RETURN e, s

MATCH path = (s1:Schema)-[:REFERENCES*1..3]->(s2:Schema) WHERE s1.name = 'PaymentInitiation' RETURN path


### Embedding Providers

For production deployments with ChromaDB, consider using advanced embedding providers:

**OpenAI Embeddings** (highest quality):
```typescript
// Set environment variable
export OPENAI_API_KEY="your-api-key"

// Modify vectorStore.ts to use OpenAIEmbeddingProvider
const embeddingProvider = new OpenAIEmbeddingProvider(
  process.env.OPENAI_API_KEY,
  'text-embedding-3-small'  // or 'text-embedding-3-large'
);

Local Embeddings (default, no API required): The server includes a built-in TF-IDF-based embedding provider that works without external APIs. It's automatically used when no other provider is configured.

Deployment Scenarios

Scenario	ChromaDB	Neo4j	Tools Available	Best For
Full Stack	✅ Running	✅ Running	24 tools	Production, research, complex analysis
Semantic Focus	✅ Running	❌ Not available	18 tools	Documentation search, Q&A
Graph Focus	❌ Not available	✅ Running	18 tools	API architecture analysis
Minimal/Dev	❌ Not available	❌ Not available	12 tools	Development, basic queries

Architecture

Core Components

The Berlin Group MCP Server is built with a modular architecture consisting of several specialized components:

1. YAML Parser (`yamlParser.ts`)

Parses Berlin Group OpenAPI specifications from YAML files, extracting:

API endpoints (paths, methods, parameters)
Schema definitions and data models
Tags, descriptions, and metadata
Request/response specifications

2. PDF Parser (`pdfParser.ts`)

Processes PDF documentation files using pdf-parse library:

Extracts full text content from PDF documents
Performs keyword-based text search
Provides document summaries and metadata

3. Text Chunker (`textChunker.ts`)

Implements intelligent document segmentation for vector embedding:

Recursive Character Splitting: Breaks text at natural boundaries (paragraphs, sentences, clauses)
Configurable Chunk Size: Default 1000 characters with 200 character overlap for context continuity
Metadata Preservation: Tracks source file, chunk index, section headers, and page estimates
Semantic Coherence: Maintains meaning by avoiding splits mid-sentence when possible

4. Vector Store (`vectorStore.ts`)

Manages semantic search capabilities using ChromaDB:

ChromaDB Integration: Optional connection to ChromaDB server for persistent vector storage
Local Embedding Provider: Built-in TF-IDF-like embedding generation when external APIs are unavailable
OpenAI Embedding Support: Configurable integration with OpenAI's embedding models (text-embedding-3-small, text-embedding-3-large)
Automatic Fallback: Uses in-memory vector storage when ChromaDB server is unavailable
Semantic Search: Natural language queries with relevance scoring and distance metrics
Metadata Filtering: Search within specific files or document sections

How Vector Store Works:

PDF documents are split into chunks by the Text Chunker
Each chunk is converted to a vector embedding (384-3072 dimensions depending on provider)
Embeddings are stored in ChromaDB collection or in-memory fallback
User queries are embedded using the same model
Cosine similarity finds the most relevant chunks
Results are ranked by relevance score (0.0 to 1.0)

5. Graph Store (`graphStore.ts`)

Manages graph database operations with Neo4j:

Neo4j Integration: Optional connection to Neo4j database for complex relationship queries
In-Memory Fallback: Complete graph implementation when Neo4j is unavailable
Connection Management: Handles driver lifecycle, sessions, and transactions
CRUD Operations: Create/read nodes and relationships with typed interfaces
Cypher Query Execution: Direct access to Neo4j's powerful query language
Statistics: Provides metrics on node counts, relationships, and graph density

Graph Node Types:

Specification: OpenAPI spec metadata (title, version, description)
Endpoint: API paths with HTTP methods
Schema: Data models and type definitions
Property: Schema fields with types and constraints
Parameter: Request parameters (query, header, path, cookie)
Response: HTTP response definitions with status codes
Tag: Endpoint categorization

Graph Relationship Types:

DEFINES_ENDPOINT: Specification → Endpoint
DEFINES_SCHEMA: Specification → Schema
HAS_PARAMETER: Endpoint → Parameter
HAS_RESPONSE: Endpoint → Response
USES_SCHEMA: Endpoint/Parameter/Response → Schema
REFERENCES: Schema → Schema (for $ref relationships)
HAS_PROPERTY: Schema → Property
TAGGED_WITH: Endpoint → Tag

6. Graph Indexer (`graphIndexer.ts`)

Transforms OpenAPI specifications into graph structures:

Specification Indexing: Creates nodes for each loaded specification file
Endpoint Extraction: Parses all API endpoints with full details
Schema Mapping: Extracts all data models and their properties
Relationship Building: Connects endpoints to schemas, parameters, and responses
Reference Resolution: Follows $ref pointers to build schema dependency graphs
Progress Tracking: Provides real-time feedback during indexing operations
Error Handling: Gracefully handles malformed specifications

Indexing Process:

Load YAML files from yml_files/ directory
Create Specification nodes for each file
Extract and create Endpoint nodes
Extract and create Schema nodes with properties
Build relationships between all entities
Index into Neo4j or in-memory store

7. Graph Models (`graphModels.ts`)

Defines TypeScript interfaces for type-safe graph operations:

Node interfaces (SpecificationNode, EndpointNode, SchemaNode, etc.)
Relationship type enums
Query result types (GraphTraversalResult, PatternSearchResult, etc.)
DTO types for creating nodes
Utility functions for ID generation and reference extraction

8. Specification Indexer (`indexer.ts`)

Orchestrates all components and provides unified API:

Coordinates YAML Parser, PDF Parser, Vector Store, and Graph Indexer
Manages initialization sequence and error handling
Provides high-level search and query methods
Handles fallback scenarios when optional services are unavailable
Aggregates statistics across all subsystems

Data Flow

┌─────────────────┐     ┌──────────────────┐     ┌─────────────────┐
│   YAML Files    │────>│  YAML Parser     │────>│  Graph Indexer  │
│  (OpenAPI)      │     │                  │     │                 │
└─────────────────┘     └──────────────────┘     └────────┬────────┘
                                                            │
                                                            v
┌─────────────────┐     ┌──────────────────┐     ┌─────────────────┐
│   PDF Files     │────>│  PDF Parser +    │────>│  Vector Store   │
│ (Documentation) │     │  Text Chunker    │     │   (ChromaDB)    │
└─────────────────┘     └──────────────────┘     └─────────────────┘
                                                            │
                                                            v
                                             ┌──────────────────────────┐
                                             │   Specification Indexer  │
                                             │   (Unified Interface)    │
                                             └────────────┬─────────────┘
                                                          │
                                                          v
                                             ┌──────────────────────────┐
                                             │     MCP Server Tools     │
                                             │  (24 Tools Available)    │
                                             └──────────────────────────┘

Fallback Mechanisms

The server is designed to work in various deployment scenarios:

Full Stack (ChromaDB + Neo4j):
- All 24 tools available
- Best performance and capabilities
- Semantic search with persistent vectors
- Complex graph queries with Cypher
Vector Only (ChromaDB, no Neo4j):
- 18 tools available (core + semantic search)
- Graph queries use in-memory implementation
- Good for semantic search focused use cases
Graph Only (Neo4j, no ChromaDB):
- 18 tools available (core + graph database)
- Semantic search falls back to keyword matching
- Good for relationship exploration use cases
Minimal (no external databases):
- 12 core tools available
- All operations use in-memory storage
- Keyword-based search only
- Suitable for basic queries and development

The server automatically detects available services during initialization and adjusts its capabilities accordingly. Users are informed which features are available through statistics and status endpoints.

Available Tools

The server provides 24 MCP tools organized into three categories:

Core Tools (12 tools)

Search and Discovery

search_endpoints - Search for API endpoints across all specifications
```
Example: "Find all payment endpoints"
```
search_schemas - Search for data schemas and models
```
Example: "Find schemas related to transaction"
```
search_pdf_documentation - Search through PDF documentation using keyword matching
```
Example: "Search for SCA requirements"
```
search_all - Comprehensive keyword search across all sources (endpoints, schemas, PDFs)
```
Example: "Find everything about consent"
```

Endpoint Information

get_endpoint_details - Get detailed information about a specific endpoint
```
Parameters: path, method
Example: path="/v1/accounts", method="GET"
```
filter_endpoints_by_tag - Filter endpoints by tag
```
Example: tag="accounts"
```
filter_endpoints_by_method - Filter endpoints by HTTP method
```
Example: method="POST"
```

Schema Information

get_schema - Get a specific schema definition

Parameters: schemaName, specFile (optional)
Example: schemaName="AccountDetails"

Specification Management

list_specifications - List all available OpenAPI specifications
get_specification_details - Get comprehensive details about a specific spec
```
Parameters: fileName
```
list_pdf_documents - List all available PDF documentation
get_statistics - Get basic statistics about loaded specifications

Semantic Search Tools (6 tools)

These tools use ChromaDB and vector embeddings for intelligent, context-aware document retrieval. When ChromaDB is unavailable, they automatically fall back to keyword-based search.

search_pdf_semantic - Perform semantic search across PDF documentation

Parameters: query (string), topK (number, default: 10)
Example: "What are the authentication requirements for payment initiation?"

How it works:
- Converts your natural language query into a vector embedding
- Finds the most semantically similar document chunks
- Returns results ranked by relevance score (0.0-1.0)
- Understands synonyms and related concepts (e.g., "authenticate" matches "authorization")

search_pdf_semantic_filtered - Semantic search with metadata filters

Parameters: query (string), fileName (optional), section (optional), topK (optional)
Example: query="SCA exemptions", fileName="Implementation_Guide.pdf"

Use cases:
- Search within a specific document
- Filter by document section
- Narrow results to relevant portions

search_all_semantic - Comprehensive semantic search across all sources

Parameters: query (string), topK (number, default: 10)
Example: "How do I handle declined payments?"

Returns:
- Matching endpoints (keyword search)
- Matching schemas (keyword search)
- Semantically similar PDF content (vector search)

get_vector_store_stats - Get vector store statistics

Returns:
- enabled: Whether vector store is operational
- totalChunks: Number of indexed document chunks
- collectionName: ChromaDB collection name
- isInMemory: Whether using in-memory fallback

Graph Database Tools (6 tools)

These tools use Neo4j for exploring complex relationships between specifications, endpoints, schemas, and data models. When Neo4j is unavailable, they use an in-memory graph implementation.

graph_find_related_schemas - Find schemas related through $ref references

Parameters: schemaName (string), specFile (optional), maxDepth (number, default: 3)
Example: schemaName="AccountReference"

Use cases:
- Understand schema inheritance hierarchies
- Find all schemas that reference a particular type
- Discover composed data models
- Map schema dependencies

graph_get_endpoint_dependencies - Get all dependencies of an API endpoint

Parameters: path (string), method (string), specFile (optional)
Example: path="/v1/payments/sepa-credit-transfers", method="POST"

Returns:
- All request parameters (query, header, path, body)
- Request body schema and nested schemas
- All possible response codes and their schemas
- Complete dependency tree

graph_traverse_relationships - Execute custom graph traversal with filters

Parameters: 
  - startNodeType: Type of starting node (Specification, Endpoint, Schema, etc.)
  - startNodeFilter: Property filters (e.g., {name: "AccountReference"})
  - relationshipTypes: Optional list of relationship types to follow
  - maxDepth: Maximum traversal depth (default: 3)

Example: 
  startNodeType="Schema", 
  startNodeFilter={name: "PaymentInitiation*"},
  relationshipTypes=["REFERENCES", "USES_SCHEMA"]

Use cases:
- Custom relationship exploration
- Multi-hop dependency analysis
- Pattern-based graph queries

graph_get_specification_graph - Get complete graph for a specification

Parameters: fileName (string)
Example: fileName="BG_oFA_PIS_Version_2.3_20251128.openapi.yaml"

Returns:
- All endpoints in the specification
- All schemas and their properties
- All relationships between entities
- Complete specification structure as a graph

graph_search_by_pattern - Search graph nodes by property patterns

Parameters: 
  - nodeType: Type of node to search
  - pattern: Property pattern with wildcards (e.g., {path: "/v1/accounts*"})
  - limit: Maximum results (default: 50)

Example: 
  nodeType="Endpoint", 
  pattern={path: "/v1/payments/*", method: "POST"}

Supports wildcards:
- {name: "*Account*"} - Contains "Account"
- {path: "/v1/accounts*"} - Starts with "/v1/accounts"
- {method: "POST"} - Exact match

get_graph_store_stats - Get graph database statistics

Returns:
- enabled: Whether graph store is operational
- usingNeo4j: Whether connected to Neo4j (true) or using in-memory (false)
- Node counts by type (Specification, Endpoint, Schema, etc.)
- Relationship counts by type
- Indexing metrics (duration, errors)
- Graph density metrics

Tool Selection Guide

Use Core Tools when:

You need exact endpoint paths or schema names
You want to filter by tags or HTTP methods
You're looking for specific specification details

Use Semantic Search Tools when:

You have natural language questions
You're exploring concepts across documentation
You don't know the exact terminology
You want AI-powered relevance ranking

Use Graph Database Tools when:

You need to understand relationships and dependencies
You're exploring schema inheritance
You want to analyze endpoint complexity
You need to traverse multi-level references

Usage Examples

In VS Code with GitHub Copilot

Basic Queries

You: "What endpoints are available for account information?"
Copilot: [Uses search_endpoints tool to find AIS endpoints]

You: "Show me the schema for payment initiation request"
Copilot: [Uses search_schemas tool to find payment schemas]

Semantic Search Queries

You: "How do I implement Strong Customer Authentication?"
Copilot: [Uses search_pdf_semantic to find relevant SCA documentation with AI ranking]

You: "What are the requirements for payment authorization?"
Copilot: [Uses search_all_semantic to find endpoints, schemas, and semantically related PDF content]

You: "Find information about transaction status in the PIS specification"
Copilot: [Uses search_pdf_semantic_filtered with fileName filter]

Graph Database Queries

You: "What schemas does AccountReference depend on?"
Copilot: [Uses graph_find_related_schemas to traverse schema relationships]

You: "Show me all dependencies for the payment initiation endpoint"
Copilot: [Uses graph_get_endpoint_dependencies to get parameters, request/response schemas]

You: "Find all endpoints that use the Amount schema"
Copilot: [Uses graph_traverse_relationships starting from Amount schema]

You: "Get the complete API structure for the AIS specification"
Copilot: [Uses graph_get_specification_graph to return full specification graph]

Advanced Analysis

You: "Compare the complexity of payment endpoints vs account endpoints"
Copilot: [Uses graph_get_endpoint_dependencies for multiple endpoints and compares]

You: "What are all the possible error responses for account endpoints?"
Copilot: [Uses graph_traverse_relationships to find all response schemas]

You: "Show me all schemas that contain PII (personally identifiable information)"
Copilot: [Uses search_pdf_semantic to find PII references, then graph_search_by_pattern to find related schemas]

Programmatic Usage

The server can also be used programmatically via the MCP protocol:

// Example tool call
{
  "method": "tools/call",
  "params": {
    "name": "search_endpoints",
    "arguments": {
      "query": "payment"
    }
  }
}

Project Structure

Berlin-group-mcp/
├── src/
│   ├── index.ts              # Main MCP server with 24 tool definitions
│   ├── indexer.ts            # Specification indexer orchestrating all components
│   ├── yamlParser.ts         # OpenAPI YAML parser
│   ├── pdfParser.ts          # PDF document parser
│   ├── textChunker.ts        # Intelligent text chunking for vector embeddings
│   ├── vectorStore.ts        # ChromaDB integration for semantic search
│   ├── graphStore.ts         # Neo4j integration and in-memory graph store
│   ├── graphIndexer.ts       # Graph database indexer
│   └── graphModels.ts        # TypeScript interfaces for graph entities
├── yml_files/                # Berlin Group OpenAPI specs (7 specifications)
│   ├── BG_oFA_AIS_Version_2.3_20250818.openapi.yaml
│   ├── BG_oFA_PIS_Version_2.3_20251128.openapi.yaml
│   ├── BG_oFA_PIIS_Version_2.3_20250818.openapi.yaml
│   ├── BG_oFA_BASK_Version_2.2_20251128.openapi.yaml
│   ├── BG_oFA_Consent_Version_2.1_20251128.openapi.yaml
│   ├── BG_oFA_dataDictionary_Version_2.2.6_20250818.openapi.yaml
│   └── BG_oFA_PUSH_Version_2.2_20250818.openapi.yaml
├── pdf_files/                # PDF documentation (implementation guides, frameworks)
├── tests/
│   ├── unit/                 # Unit tests for individual components
│   │   ├── vectorStore.test.ts
│   │   ├── graphStore.test.ts
│   │   ├── graphIndexer.test.ts
│   │   ├── graphModels.test.ts
│   │   └── textChunker.test.ts
│   └── integration/          # Integration tests
│       ├── semanticSearch.test.ts
│       └── graphSearch.test.ts
├── build/                    # Compiled JavaScript (generated)
├── docs/                     # Architecture documentation and diagrams
│   └── architecture/
│       └── diagrams/         # PlantUML diagrams for system architecture
├── postman/                  # Postman collection for testing MCP tools
├── package.json              # Dependencies: chromadb, neo4j-driver, pdf-parse, etc.
├── tsconfig.json
├── jest.config.js            # Test configuration
├── .vscode/
│   └── mcp-settings.json     # VS Code MCP configuration
├── INTELLIJ_SETUP.md         # IntelliJ configuration guide
└── README.md

Development

Running Tests

The project includes comprehensive unit and integration tests:

# Run all tests
npm test

# Run tests in watch mode
npm run test:watch

# Run with coverage report
npm run test:coverage

Test Coverage:

Unit Tests: vectorStore.test.ts, graphStore.test.ts, graphIndexer.test.ts, graphModels.test.ts, textChunker.test.ts
Integration Tests: semanticSearch.test.ts, graphSearch.test.ts

Watch Mode

To automatically rebuild on file changes:

npm run watch

Debugging

To debug the server with Node.js inspector:

npm run inspector

Adding New Specifications

Add YAML files to yml_files/ directory
Add PDF files to pdf_files/ directory
Rebuild the project: npm run build
Restart the MCP server (reload VS Code or restart IDE)
New specifications will be automatically indexed on next startup

Extending the Server

Adding a New Tool:

Define tool schema in src/index.ts TOOLS array
Add handler in CallToolRequestSchema handler
Implement business logic in src/indexer.ts
Update README documentation

Adding a New Embedding Provider:

Implement EmbeddingProvider interface in src/vectorStore.ts
Add embed() and embedQuery() methods
Configure in src/indexer.ts or via environment variables

Customizing Graph Schema:

Add new node types in src/graphModels.ts
Add relationships in RelationshipType enum
Update indexing logic in src/graphIndexer.ts
Add query methods in src/graphStore.ts

Troubleshooting

Server Not Starting

Check Node.js version: node --version (should be v18+)
Verify build completed: ls -la build/ (should see .js files)
Check for errors: Look in VS Code Developer Tools console (Help → Toggle Developer Tools)
Rebuild: npm run build

Tools Not Appearing

Ensure MCP settings file exists: Check .vscode/mcp-settings.json
Verify correct paths: Ensure the path to build/index.js is absolute and correct
Restart VS Code completely: Close all windows and reopen
Check GitHub Copilot: Ensure Copilot is enabled and working
Check console logs: Open Developer Tools and look for MCP connection errors

No Results from Search

Verify YAML and PDF files exist:
```
ls -la yml_files/ pdf_files/
```
Check server logs: Look for initialization errors in console
Ensure files are readable: Check file permissions
Try reindexing: Delete and rebuild: rm -rf build && npm run build

Semantic Search Not Working

Check if ChromaDB is running (optional):

curl http://localhost:8000/api/v1/heartbeat

Review initialization logs: Should see "Indexed X PDF chunks in vector store"
Check fallback mode: Server will fall back to keyword search if ChromaDB unavailable
Verify vector store stats: Use get_vector_store_stats tool
Check ChromaDB logs (if running via Docker):
```
docker logs <chromadb-container-id>
```

Graph Database Not Working

Check if Neo4j is running (optional):

curl http://localhost:7474
# Or check Docker: docker ps | grep neo4j

Verify credentials: Default is neo4j/neo4j (change on first login)
Review initialization logs: Should see "Graph indexing complete: X specs, Y endpoints, Z schemas"
Check fallback mode: Server will use in-memory graph if Neo4j unavailable
Verify graph store stats: Use get_graph_store_stats tool

Test Neo4j connection:

# Using cypher-shell
cypher-shell -u neo4j -p your-password

Performance Issues

Large PDF files: Consider splitting into smaller documents
ChromaDB slow:
- Use local deployment instead of remote
- Reduce topK parameter in semantic searches
- Consider faster embedding provider
Neo4j slow:
- Check if indexes are created
- Reduce maxDepth in graph traversals
- Optimize Cypher queries
Memory usage high:
- Use external databases (ChromaDB + Neo4j) instead of in-memory
- Reduce number of specifications loaded

Connection Errors

ChromaDB Connection Refused:

Error: connect ECONNREFUSED 127.0.0.1:8000

Solution: ChromaDB is not running or running on different port. Server will automatically fall back to in-memory mode.

Neo4j Connection Failed:

Neo4jError: Could not connect to bolt://localhost:7687

Solution: Neo4j is not running or wrong credentials. Server will automatically fall back to in-memory mode.

Permission Issues

# If index.js is not executable
chmod +x build/index.js

# If YAML/PDF files are not readable
chmod -R 644 yml_files/*.yaml pdf_files/*.pdf

Debugging Tips

Enable verbose logging: Set NODE_ENV=development before starting server
Check initialization sequence: Server logs show each phase

Test individual components:

npm test -- vectorStore.test.ts
npm test -- graphStore.test.ts

Verify tool availability: Use get_statistics, get_vector_store_stats, get_graph_store_stats tools
Check MCP communication: Look for JSON-RPC messages in developer console

Common Error Messages

Error	Cause	Solution
"Specifications not yet loaded"	Server still initializing	Wait 5-10 seconds and retry
"Semantic search is not available"	ChromaDB not connected	Normal, falls back to keyword search
"Graph store is not available"	Neo4j not connected	Normal, falls back to in-memory
"Collection not found"	ChromaDB collection missing	Server creates it automatically on startup
"Authentication failed"	Wrong Neo4j credentials	Update credentials in code or use default

Technical Details

MCP Protocol

This server implements the Model Context Protocol specification (2025-11-25):

Tools: 24 tools organized into core, semantic search, and graph database categories
Resources: Direct access to specification files via berlin-group:// URI scheme
Transport: stdio-based communication for IDE integration

Component Architecture

Parser Features

YAML Parser:
- Extracts paths, operations, schemas, components from OpenAPI 3.0+ specs
- Handles $ref pointer resolution
- Validates specification structure
- Indexes tags, parameters, and responses
PDF Parser:
- Uses pdf-parse library for text extraction
- Preserves document structure and metadata
- Enables full-text keyword search
- Provides page number estimation
Text Chunker:
- Recursive character splitting algorithm
- Configurable chunk size (default: 1000 chars) and overlap (default: 200 chars)
- Maintains semantic coherence across chunks
- Preserves metadata (file name, section, page number)

Vector Store Implementation

ChromaDB Integration:
- HTTP client connection to ChromaDB server
- Collection-based document organization
- Metadata filtering support
- Cosine similarity for relevance scoring
Embedding Providers:
- LocalEmbeddingProvider: TF-IDF-based, 384 dimensions, no external dependencies
- OpenAIEmbeddingProvider: GPT-based, 1536 or 3072 dimensions, requires API key
- Pluggable architecture for custom providers
Search Algorithms:
- Query embedding generation
- K-nearest neighbors (KNN) search
- Distance metrics (cosine similarity, L2 distance)
- Relevance score normalization (0.0 to 1.0)

Graph Store Implementation

Neo4j Integration:
- Bolt protocol driver (neo4j-driver v5.27.0)
- Connection pooling for performance
- Transaction management
- Cypher query execution

Graph Schema:

Nodes: Specification, Endpoint, Schema, Property, Parameter, Response, Tag
Relationships: DEFINES_ENDPOINT, DEFINES_SCHEMA, HAS_PARAMETER, 
               HAS_RESPONSE, USES_SCHEMA, REFERENCES, HAS_PROPERTY, TAGGED_WITH

In-Memory Fallback:
- Complete graph implementation using Maps
- Same API as Neo4j implementation
- Supports all query patterns
- Suitable for development and testing

Indexing Process

Initialization (parallel):
- Load YAML files → Parse specifications → Extract endpoints/schemas
- Load PDF files → Parse documents → Chunk text → Generate embeddings
Vector Store Indexing:
- Chunk all PDF documents (typical: 200-500 chunks per document)
- Generate embeddings for each chunk
- Store in ChromaDB with metadata
- Build search index
Graph Store Indexing:
- Create Specification nodes
- Create Endpoint nodes with relationships
- Create Schema nodes with properties
- Create Parameter and Response nodes
- Build REFERENCES relationships for $ref pointers
- Create Tag nodes and relationships
Error Handling:
- Graceful degradation if databases unavailable
- Detailed logging of indexing progress
- Error collection without stopping process
- Fallback to in-memory storage

Performance

Initial Load Time:
- YAML parsing: ~500ms (7 specifications)
- PDF parsing: ~1-2s (depends on file count/size)
- Vector indexing: ~2-5s (depends on chunk count and embedding provider)
- Graph indexing: ~1-3s (depends on database connection)
- Total: ~5-10 seconds for full initialization
Query Performance:
- Keyword search: <10ms (in-memory search)
- Semantic search: 50-200ms (depends on ChromaDB response time and top-k)
- Graph queries: 10-100ms (simple queries), 100-500ms (complex traversals)
- In-memory fallback: <50ms for most operations
Memory Usage:
- Base (specifications): ~20-30MB
- Vector store (in-memory): +30-50MB
- Graph store (in-memory): +20-40MB
- Total: ~70-120MB (without external databases)
- With external databases: ~30-50MB (stores data externally)
Scalability:
- Can handle 100+ specifications
- Supports 1000+ PDF pages
- Graph queries scale with Neo4j (millions of nodes)
- Vector search scales with ChromaDB (millions of chunks)

Quick Reference

Tool Categories Summary

Category	Count	Purpose	Requires
Core Tools	12	Basic search, filtering, specification access	None (built-in)
Semantic Search	6	AI-powered document retrieval, natural language queries	ChromaDB (optional)
Graph Database	6	Relationship exploration, dependency analysis	Neo4j (optional)
Total	24	Complete specification analysis toolkit	Node.js only

Key Features Comparison

Feature	Without Databases	With ChromaDB	With Neo4j	With Both
Endpoint Search	✅ Keyword	✅ Keyword	✅ Keyword	✅ Keyword
Schema Search	✅ Keyword	✅ Keyword	✅ Keyword	✅ Keyword
PDF Search	✅ Keyword	✅ Semantic + Keyword	✅ Keyword	✅ Semantic + Keyword
Schema Relationships	✅ In-memory	✅ In-memory	✅ Neo4j Graph	✅ Neo4j Graph
Endpoint Dependencies	✅ In-memory	✅ In-memory	✅ Neo4j Graph	✅ Neo4j Graph
Graph Traversal	✅ Limited	✅ Limited	✅ Full Cypher	✅ Full Cypher
Performance	Good	Excellent (PDF)	Excellent (Graph)	Excellent (Both)
Memory Usage	~120MB	~70MB	~80MB	~50MB

Common Queries Cheat Sheet

// Find endpoints
"search for payment endpoints"
→ Uses: search_endpoints

// Find schemas
"show me the AccountDetails schema"
→ Uses: get_schema or search_schemas

// Natural language search (semantic)
"how to handle authentication errors?"
→ Uses: search_pdf_semantic

// Find related schemas
"what schemas does PaymentInitiation reference?"
→ Uses: graph_find_related_schemas

// Analyze endpoint
"what are all the parameters and responses for POST /v1/payments?"
→ Uses: graph_get_endpoint_dependencies

// Explore relationships
"show me all schemas that use Address type"
→ Uses: graph_traverse_relationships

// Get overview
"show me statistics about the loaded specifications"
→ Uses: get_statistics, get_vector_store_stats, get_graph_store_stats

License

MIT

References

Model Context Protocol - MCP specification and documentation
Berlin Group Open Finance - Official Berlin Group website
MCP SDK Documentation - TypeScript SDK for MCP
ChromaDB Documentation - Vector database for AI applications
Neo4j Documentation - Graph database platform
OpenAPI Specification - API specification format

Contributing

Contributions are welcome! Please ensure:

TypeScript code follows project conventions
All tools have proper error handling
Documentation is updated for new features
Tests are added for new components (unit tests in tests/unit/, integration tests in tests/integration/)
New embedding providers implement the EmbeddingProvider interface
New graph node types are added to graphModels.ts
README is updated with examples and usage instructions

Areas for Contribution

Additional embedding providers (Cohere, HuggingFace, etc.)
Enhanced graph query capabilities
Additional Berlin Group specifications
Performance optimizations
Additional MCP tools
Documentation improvements

Support

For issues or questions:

Check the troubleshooting section
Review MCP documentation
Check Berlin Group specification documentation

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

Berlin Group MCP Server

README

Berlin Group MCP Server

Overview

Features

Available Specifications

Installation

Prerequisites

Setup

Configuration

VS Code

IntelliJ IDEA

Dependencies

Core Dependencies

Advanced Features

Development Dependencies

Optional: External Database Configuration

Configuration Methods

Environment Variables

Direct Configuration

ChromaDB (for Semantic Search)

Neo4j (for Graph Database)

Deployment Scenarios

Architecture

Core Components

1. YAML Parser (yamlParser.ts)

2. PDF Parser (pdfParser.ts)

3. Text Chunker (textChunker.ts)

4. Vector Store (vectorStore.ts)

5. Graph Store (graphStore.ts)

6. Graph Indexer (graphIndexer.ts)

7. Graph Models (graphModels.ts)

8. Specification Indexer (indexer.ts)

Data Flow

Fallback Mechanisms

Available Tools

Core Tools (12 tools)

Search and Discovery

Endpoint Information

Schema Information

Specification Management

Semantic Search Tools (6 tools)

Graph Database Tools (6 tools)

Tool Selection Guide

Usage Examples

In VS Code with GitHub Copilot

Basic Queries

Semantic Search Queries

Graph Database Queries

Advanced Analysis

Programmatic Usage

Project Structure

Development

Running Tests

Watch Mode

Debugging

Adding New Specifications

Extending the Server

Troubleshooting

Server Not Starting

Tools Not Appearing

No Results from Search

Semantic Search Not Working

Graph Database Not Working

Performance Issues

Connection Errors

Permission Issues

Debugging Tips

Common Error Messages

Technical Details

MCP Protocol

Component Architecture

Parser Features

Vector Store Implementation

Graph Store Implementation

Indexing Process

Performance

Quick Reference

Tool Categories Summary

Key Features Comparison

1. YAML Parser (`yamlParser.ts`)

2. PDF Parser (`pdfParser.ts`)

3. Text Chunker (`textChunker.ts`)

4. Vector Store (`vectorStore.ts`)

5. Graph Store (`graphStore.ts`)

6. Graph Indexer (`graphIndexer.ts`)

7. Graph Models (`graphModels.ts`)

8. Specification Indexer (`indexer.ts`)