llamacpp-mcp
An MCP server for running local LLMs using llama-cpp-python, with built-in tools for generating drug-like molecule SMILES.
README
llamacpp-mcp
An MCP (Model Context Protocol) wrapper for running local LLMs using llama-cpp-python. This project provides a framework for integrating local language models as MCP tools with built-in support for specialized models like SmileyLlama.
SmileyLlama Integration
Generate SMILES strings (chemical notation) for drug-like molecules with fine-grained constraints:
- Lipinski's Rule of Five validation
- Hydrogen bond donor/acceptor limits
- Molecular weight and LogP constraints
- Warhead SMARTS pattern matching
- Macrocycle detection and filtering
- And more...
Installation
Prerequisites
- Python ≥ 3.13
uv(recommended) or pip
Setup
Clone the repository and install dependencies:
git clone <repository-url>
cd llamacpp-mcp
uv sync
Backend Configuration
The llama-cpp-python library requires compilation with hardware acceleration support. Choose the appropriate backend for your system:
CUDA (NVIDIA GPUs):
CMAKE_ARGS="-DGGML_CUDA=on" uv pip install llama-cpp-python --force-reinstall --no-cache-dir
ROCm (AMD GPUs):
CMAKE_ARGS="-DGGML_HIPBLAS=on" uv pip install llama-cpp-python --force-reinstall --no-cache-dir
Metal (Apple Silicon):
CMAKE_ARGS="-DGGML_METAL=on" uv pip install llama-cpp-python --force-reinstall --no-cache-dir
CPU-only (no GPU acceleration):
uv sync
Usage
Run the agent example
Setup your example/fastagent.secrets.yaml:
anthropic:
api_key: your-api-key-here
Then run the agent interface in the terminal:
cd example/
uv run --extra agent agent.py
Running the MCP Server
Start the MCP server with a GGUF model:
uv run llamacpp-mcp -i /path/to/model.gguf
Additional parameters can be passed as command-line arguments:
uv run llamacpp-mcp --input model.gguf -n_gpu_layers -1 -n_threads 8
Common parameters:
-n_gpu_layers: Number of model layers to offload to GPU (-1 for all)-n_threads: Number of CPU threads to use-n_ctx: Context window size-verbose: Verbosity level
Available Tools
generate_smiles
Generate SMILES strings for drug-like molecules with optional constraints.
Parameters:
max_hbond_donors: Maximum hydrogen bond donorsmax_hbond_acceptors: Maximum hydrogen bond acceptorsmax_molecular_weight: Maximum molecular weightmax_clogp: Maximum calculated LogPlipinski_rule_of_five: Enforce Lipinski's Rule of Fiverule_of_three: Enforce Rule-of-Three for fragment-like molecules- And additional constraint options...
Dependencies
Core:
fastmcp>=2.13.1- MCP server frameworkllama-cpp-python>=0.3.16- LLM inference engine
Optional:
fast-agent-mcp>=0.2.25- For agent-based integrations
Development
Project Setup
The project uses uv for dependency management. After installing uv, run:
uv sync
This installs all dependencies in a local virtual environment.
Adding New Models
To add a new model type:
- Create a subdirectory under
src/llamacpp_mcp/models/ - Implement
models.pywith Pydantic constraint definitions - Implement
tools.pywith tool registration function - Import and register tools in the main
__init__.py
Configuration
Model parameters can be configured via:
- Command-line arguments - Pass directly to
llamacpp-mcp - Environment variables - Set before running the server
- Agent Tool Configuration - See
example/fastagent.config.yamlfor reference
License
MIT License
Author
Lukas Kim
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.