MCP Enhanced Data Retrieval System

MCP Enhanced Data Retrieval System

Enables AI applications to access and contextualize organizational knowledge sources including GitHub repositories and internal documentation through standardized MCP protocol integration. Features OAuth 2.1 authentication, vector-based semantic search, and optimized context chunking for enterprise development workflows.

Category
Visit Server

README

MCP Enhanced Data Retrieval System

An MCP (Model Context Protocol) server that standardizes AI context sharing by integrating organizational knowledge sources (GitHub, internal docs, APIs) to enable domain-aware AI assistance for enterprise development workflows.

Project Overview

This system implements the Model Context Protocol to provide:

  • Standardized AI context sharing across organizational knowledge sources
  • GitHub repository integration with OAuth 2.1 authentication
  • Vector-based semantic search using embeddings
  • Optimized 1500-token context chunking for sub-500ms TTFT
  • Parallel retrieval strategy with 2-second timeout
  • Streamable HTTP transport using FastAPI

Architecture

AI Applications
    ↓
Authentication (OAuth 2.1 + RBAC)
    ↓
MCP Client
    ↓
MCP Protocol (JSON-RPC + HTTP)
    ↓
MCP Server
    • Multi-threaded parallel retrieval
    • 1500-token chunking
    ↓
Knowledge Tiers (Public, Internal, Restricted)
    ↓
Data Sources: GitHub | Docs
Vector Storage: Embeddings

Features

  • MCP Protocol Compliance: JSON-RPC 2.0 over Streamable HTTP
  • GitHub Integration: Repository data retrieval and contextualization
  • Vector Embeddings: Semantic search using ChromaDB and Sentence Transformers
  • Context Optimization: 1500-token chunking with parallel retrieval
  • OAuth 2.1 Security: Secure authentication for GitHub access
  • Performance: Sub-500ms response times with 2-second retrieval timeout

Project Structure

.
├── src/
│   ├── server/          # MCP server core and FastAPI app
│   ├── auth/            # OAuth 2.1 authentication
│   ├── github/          # GitHub API integration
│   ├── vector/          # Vector database and embeddings
│   └── utils/           # Utilities and helpers
├── tests/               # Test suite
├── config/              # Configuration files
├── data/                # Data storage (vector DB, cache)
├── logs/                # Application logs
├── requirements.txt     # Python dependencies
└── .env.example         # Environment variables template

Setup

  1. Clone and navigate to the project:

    cd "MCP Enhanced Data Retrieval"
    
  2. Create virtual environment:

    python3 -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    
  3. Install dependencies:

    pip install -r requirements.txt
    
  4. Configure environment variables:

    cp .env.example .env
    # Edit .env with your credentials
    
  5. Run the server:

    uvicorn src.server.main:app --reload
    

Milestone 1 Goals

  • ✅ MCP protocol analysis and communication flow evaluation
  • ✅ High-level architecture design for enterprise knowledge integration
  • 🔄 Functional MCP server with GitHub integration
  • 🔄 OAuth 2.1 authentication implementation
  • 🔄 1500-token context chunking mechanism
  • 🔄 Vector-based semantic search

Success Criteria

  • Functional MCP server that can retrieve and contextualize GitHub repository information
  • OAuth 2.1 authentication for secure GitHub access
  • 1500-token context chunking maintaining sub-500ms TTFT
  • Parallel retrieval with 2-second timeout
  • Vector-based semantic search for relevant content

Technologies

  • MCP SDK: Anthropic MCP Python SDK
  • Web Framework: FastAPI with Streamable HTTP transport
  • GitHub API: PyGithub
  • Authentication: OAuth 2.1 (authlib)
  • Vector Database: ChromaDB
  • Embeddings: Sentence Transformers (all-MiniLM-L6-v2)
  • Token Processing: tiktoken

Author

Kalpalathika Ramanujam Advisor: Dr. Thomas Kinsman Rochester Institute of Technology

License

Academic Project - RIT Capstone

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured