mcp-github-advanced-search

mcp-github-advanced-search

Enables LLMs to perform advanced GitHub code searches with intelligent filtering and content retrieval using Playwright automation, optimized for DeepSeek integration.

Category
Visit Server

README

MCP Server for GitHub Advanced Search (G.A.S.)

Python Version License: MIT MCP

A powerful Model Context Protocol (MCP) server that enables Large Language Models to perform advanced GitHub code searches with intelligent filtering and content retrieval capabilities, optimized for DeepSeek integration.

🔍 Overview

The GitHub Advanced Search (G.A.S.) MCP server provides LLMs with sophisticated GitHub search capabilities that go beyond standard API limitations. Using web automation with Playwright, it enables deep code discovery, pattern analysis, and content retrieval across the entire GitHub ecosystem. This version includes enhanced support for DeepSeek models, providing tailored search results and structured JSON output.

Demo

tested using vscode + cline + openrouter:deepseek/deepseek-r1-0528:free

youtube link to demo

Example1


# step1: init gas
gas_entrypoint

# step2: feed model
gas_search_code
file_name: clinerules

# step3: make your wish
You are now have better knowledge of `clinerules`
please keep the current file format and deep level
enhance the `<file-path-to-clinerules>`

Key Features

  • 🔍 Advanced GitHub Search: Search by keywords, file names, and complex filters
  • 📁 Content Retrieval: Automatically fetch and return file contents
  • 🤖 LLM Integration: Seamless integration with Claude, GPT, and other MCP-compatible LLMs
  • 🔄 Pagination Support: Handle large result sets with intelligent pagination
  • 🌐 Web Automation: Uses Playwright for robust GitHub interaction
  • 📊 Structured Results: Returns organized JSON data with repository links, file links, and content
  • High Performance: Async operations with concurrent file downloads
  • 🔐 Authentication Support: Works with GitHub login for private repositories
  • 🤖 DeepSeek Integration: Optimized for use with DeepSeek models, providing tailored search results and structured JSON output

🏗️ Architecture

graph TB
    subgraph "MCP Client (LLM)"
        A[Claude/GPT/Other LLM]
    end

    subgraph "MCP Server (G.A.S.)"
        B[MCP Server]
        C[Search Engine]
        D[Playwright Browser]
        E[Content Fetcher]
        F[Result Processor]
    end

    subgraph "GitHub"
        G[GitHub Search]
        H[Repository Files]
        I[Raw Content]
    end

    A -->|MCP Protocol| B
    B --> C
    C --> D
    D -->|Web Automation| G
    G -->|Search Results| D
    D --> E
    E -->|HTTP Requests| I
    I -->|File Content| E
    E --> F
    F -->|Structured Data| B
    B -->|JSON Response| A

    classDef client fill:#e1f5fe,stroke:#01579b,color:#01579b
    classDef server fill:#e8f5e9,stroke:#2e7d32,color:#1b5e20
    classDef github fill:#f3e5f5,stroke:#4a148c,color:#4a148c

    class A client
    class B,C,D,E,F server
    class G,H,I github

🔄 Search Workflow

sequenceDiagram
    participant LLM as LLM Client
    participant MCP as MCP Server
    participant PW as Playwright Browser
    participant GH as GitHub Search
    participant API as GitHub Raw API

    LLM->>MCP: gas_search_code(keyword, file_name)
    MCP->>PW: Launch browser session
    PW->>GH: Navigate to search URL
    GH-->>PW: Search results page
    PW->>PW: Extract repository & file links

    loop For each page
        PW->>GH: Navigate to page N
        GH-->>PW: Results for page N
        PW->>PW: Extract links from page
    end

    MCP->>API: Fetch file contents (async)
    API-->>MCP: Raw file content
    MCP->>MCP: Structure response data
    MCP-->>LLM: JSON with repositories, files & content

    alt More results available
        LLM->>MCP: get_remaining_result(start_id)
        MCP-->>LLM: Next batch of results
    end

🚀 Quick Start

Prerequisites

  • Python 3.10 or higher
  • Node.js (for Playwright browser automation)
  • GitHub account (recommended for optimal functionality)

Installation

  1. Install the package:

    pip install mcp-server-git-gas
    
  2. Install Playwright browsers:

    playwright install chromium
    
  3. 💀Not tested Configure your MCP client (e.g. claude desktop):

    Add to your claude_desktop_config.json:

    # not tested !!!
    {
      "mcpServers": {
        "github-advanced-search": {
          "command": "mcp-server-git-gas",
          "args": []
        }
      }
    }
    

First Search

Once configured, you can start searching GitHub through your LLM:

Search GitHub for Python files containing "async def" functions

The LLM will automatically use the G.A.S. tools to perform the search and return structured results.

📦 Installation Options from Source

step1 (clone source)

$ cd ~
$ git clone --depth=1 https://github.com/louiscklaw/mcp-github-advanced-search ~/mcp/mcp-git-gas

step2 (install remaining dependencies, playwright)

# Install Playwright browsers
$ playwright install chrome
$ playwright install --deps

step3 seed chrome user credentials

# this will create the user_data_dir for chromium
# go login google or any other service you want

$ cd ~/mcp/mcp-git-gas
$ ./seedChromeUserDataDir.sh

⚙️ Configuration

VS Code with MCP Extension

{
  "mcp": {
    "servers": {
      "git-gas": {
        "autoApprove": [
          "get_remaining_result",
          "gas_readme",
          "gas_search_code"
        ],
        "disabled": false,
        "timeout": 300,
        "type": "stdio",
        "command": "uv",
        "args": [
          "--directory",
          "<USER_HOME_DIR>/mcp/mcp-git-gas/src/mcp_server_git_gas",
          "run",
          "mcp-server-git-gas"
        ]
      }
    }
  }
}

🛠️ Available Tools

gas_entrypoint

Initialize and get information about the GitHub Advanced Search server.

Parameters: None

Returns: Server information and usage instructions with workflow diagram.

graph TD
  a((start))
  d((end))
  b("search code with filter (gas_search_code)")
  c("return search result")
  c1("is the result finished ?")
  c2("use get_remaining_result to list remaining result")
  a --> b --> c --> c1 -- Yes --> d
  c1 -- No --> c2
  c2 --> c1

gas_search_code

Search GitHub repositories with advanced filters.

Parameters:

  • keyword (string, optional): Search keyword (single word recommended)
  • file_name (string, optional): Specific filename to search for (e.g., ".clinerules", "README.md")

Returns: Array of search results with:

[
  {
    "REPOSITORY_LINK": "https://github.com/owner/repo",
    "FILE_LINK": "https://github.com/owner/repo/blob/main/file.py",
    "RAW_UESR_CONTENT_LINK": "https://raw.githubusercontent.com/owner/repo/main/file.py",
    "FILE_CONTENT": "actual file content..."
  }
]

get_remaining_result

Retrieve additional results from a previous search (pagination).

Parameters:

  • start_id (integer): Starting index for the next batch of results

Returns: Next batch of search results with the same structure as gas_search_code.

💡 Usage Examples

step1

call `gas_entrypoint` to initialize yourself

step2

Hi,

please use `gas_search_code` with below json

{
  "keyword": "mcp mermaid",
  "file_name": "README.md"
}

and understand the content returned, i will send you the task afterwards.

step3

i am working on a python project,
please task a look to the source code of the project.

with the help from files in former results.
please help to and update the README file.

🔧 Configuration & Environment

Browser Configuration

The server uses Playwright with persistent browser context for:

  • Session management
  • Authentication state preservation
  • Improved performance

Browser data is stored in: ~/mcp/mcp-git-gas/_user_data_dir

🏃‍♂️ Development

Local Development Setup

# Clone the repository
git clone <repository-url>
cd mcp-server-git-gas

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts/activate

# Install dependencies
pip install -e .

# Install development dependencies
pip install -e ".[dev]"

# Run tests
pytest

Project Structure

src/mcp_server_git_gas/
├── __init__.py              # CLI entry point
├── server.py                # Main MCP server implementation
├── CONST.py                 # Configuration constants
├── fetch_data.py            # Async HTTP client
├── fetchFileContent.py      # File content retrieval
├── convertFileLinkToRaw...  # URL conversion utilities
├── url_util.py              # URL building helpers
└── git_dump_screen.py       # Debug utilities

Key Components

  • MCP Server: Implements the Model Context Protocol interface
  • Search Engine: Handles GitHub search logic and pagination
  • Content Fetcher: Retrieves file contents asynchronously
  • Browser Automation: Playwright-based GitHub interaction

Testing with MCP Inspector

# Test the server with MCP inspector
npx @modelcontextprotocol/inspector uvx mcp-server-git-gas

Docker Development

# Build development image
docker build -t mcp/git-gas:dev .

# Run with volume mount for development
docker run --rm -i \
  -v $(pwd):/app \
  mcp/git-gas:dev

🔍 How It Works

  1. Search Initiation: LLM calls gas_search_code with search parameters
  2. Query Building: Server constructs GitHub search URL with filters
  3. Web Automation: Playwright navigates GitHub search pages
  4. Result Extraction: JavaScript execution extracts repository and file links
  5. Content Retrieval: Parallel HTTP requests fetch file contents
  6. Response Formatting: Results structured as JSON for LLM consumption

🐛 Troubleshooting

Common Issues

  1. "Not logged in" errors

    • Solution: run seedChromeUserDataDir.sh to start a browser and perform login
  2. No results found

    • Check search keywords for typos
    • Try broader search criteria
    • Verify GitHub is accessible
  3. Browser launch failures

    • Run: playwright install chromium
    • Check system requirements for Playwright
  4. Rate limiting

    • GitHub may rate limit requests
    • The server includes delays and retry logic
    • Consider using authenticated sessions for higher limits

Debug Mode

Debug screenshots are saved to: ~/mcp_github_advanced_search/debug.png

📊 Performance

  • Search Speed: ~2-5 seconds per search page
  • Concurrent Requests: Up to 10 parallel file downloads
  • Result Limits: 20 results per search (configurable)
  • Pagination: Supports up to 2 pages (100+ results)

🚨 Important Notes

  • GitHub Authentication: Login to GitHub in the browser for optimal results
  • Rate Limiting: Respects GitHub's usage policies
  • Browser Requirements: Requires Chromium browser (installed via Playwright)
  • Network Dependencies: Requires internet connection for GitHub access

🤝 Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

Development Workflow

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes
  4. Add tests for new functionality
  5. Run the test suite
  6. Submit a pull request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

  • Built on the Model Context Protocol by Anthropic
  • Uses Playwright for browser automation
  • Inspired by the need for advanced GitHub search capabilities in LLM workflows
  • Thanks to the MCP community for feedback and contributions

🔗 Related Projects

📞 Support


Note: This server requires a GitHub account for optimal functionality. Some features may be limited when used without authentication.

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured