CodeMentor-AI
Enables IDE integration with a multi-agent AI pipeline for solving, reviewing, and optimizing code through adversarial peer review and security filtering.
README
<div align="center">
šØāš» CodeMentor AI
An Autonomous Multi-Agent Pipeline That Solves, Critiques, Verifies, and Polishes Programming Code
CodeMentor AI is a state-of-the-art Model Context Protocol (MCP) server and Streamlit dashboard built to eradicate LLM hallucinations in competitive coding. By utilizing a linear state-machine verification pipeline, it moves beyond "single prompt solving" into rigorous adversarial peer-review.
Read the Kaggle Writeup ⢠View Evaluation Metrics ⢠Security Architecture
</div>
š„ Demo
Note to Judges: The live video pitch and deployed application links will be placed here.

š The Problem Statement
Why do modern coding assistants hallucinate? Standard generic LLMs are autoregressive predictors, not engineers. When tasked with a dense LeetCode Hard problem, they frequently default to surface-level logic.
- Hidden Edge Cases: Single-shot prompts regularly fail to calculate bounds like integer overflows or $O(N^2)$ bottlenecks.
- Debugging Blindness: When AI-generated code fails, feeding the error back to the same monolithic agent often causes cyclic, oscillating hallucinations.
- Why Multi-Agent? We must segment the cognitive load. You would not deploy code without a peer review, a QA check, and a security audit. Your AI should not either.
š” The Solution
CodeMentor AI introduces a deterministic, multi-agent swarm.
- Multi-Agent Pipeline: Forces solutions through a sequenced pipeline.
- Adversarial Reflection: A dedicated agent whose only job is to brutally critique the code.
- Verification Layer: Acts as an air-gapped simulation proxy, mentally dry-running inputs.
- Security Firewall: A strict
O(1)memory limiter blocking prompt injections before API generation. - MCP Integration: Fully integrates all agents natively into IDEs (VS Code/Cursor).
⨠Key Features
| Capability | Description | Specialized Agent |
|---|---|---|
| š§ Deep Problem Solving | Solves and mathematically optimizes algorithms based on constraints. | SolverAgent |
| š Logical Debugging | Isolates silent logic flaws mapping them to line-by-line fixes. | DebugAgent |
| š Complexity Analysis | Exact Big $O$ Time/Space calculations highlighting bottlenecks. | ComplexityAgent |
| š”ļø Edge Case Generation | Hunts the specific maximum bounds that cause Memory Limit Exceeded. | TestCaseAgent |
| š FAANG Mock Interview | Refuses to write the code; uses Socratic probing to test your skills. | InterviewAgent |
| š Contest Strategy | Parses problem sets targeting time-management and difficulty estimates. | StrategyAgent |
| š Strict Code Review | Acts as an aggressive Principal Engineer enforcing Pythonic paradigms. | CodeReviewAgent |
| šØ Security Firewall | Active heuristic scanner blocking jailbreaks and Denial of Wallet (DoW). | SecurityFirewall |
š Architecture Diagrams
System Architecture
The top-level interaction between the user interface, the Security Firewall, and the LLM Pipeline.
graph TD
A[User via Streamlit or IDE/MCP] -->|Payload| B(Security Firewall)
B -->|Sanitized Valid Input| C{ManagerAgent Orchestrator}
B --x|Prompt Injection Blocked| Z[Drop Connection]
C --> D[True Pipeline]
C --> E[Competitive Personas]
C --> F[Classic Tools]
The True Agent Flow Pipeline
This diagram illustrates the State-Machine generator logic replacing the flawed "single LLM call".
sequenceDiagram
participant Manager
participant Solver
participant Reflector
participant Verification
participant QA
Manager->>Solver: Draft Algorithm
Solver-->>Manager: V1 Code
Manager->>Reflector: Try to break V1
Reflector-->>Manager: Revised V2 Code
Manager->>Verification: Mentally Dry-Run Inputs
Verification-->>Manager: Verified / Passed
Manager->>QA: Polish and Explain
QA-->>Manager: Perfect Pydantic Data
IDE MCP Integration
graph LR
IDE[VS Code / Cursor] <-->|JSON-RPC via stdio| MCP(FastMCP Server)
MCP <--> Manager[ManagerAgent Router]
Manager <--> Gemini[Google GenAI SDK]
š MCP Integration Details
Model Context Protocol (MCP) allows your local IDEs to utilize CodeMentor's unique persona-driven logic natively. CodeMentor exposes the following precise tools:
| MCP Tool Name | Description |
|---|---|
solve_problem_pipeline |
Triggers the 4-stage Reflection loop for highly reliable code generation. |
review_code |
Triggers the Strict Code Reviewer formatting style outputs. |
interview_question_generator |
Converts the IDE into a Socratic questioning loop for interview prep. |
hidden_test_detector |
Maps adversarial test cases trying to crash the current IDE buffer. |
optimize_algorithm |
Highlights $O(N)$ Big O limits. |
coding_strategy |
Evaluates Contest parameters. |
š Security Posture
AI security requires defense-in-depth methodologies. We do not rely on just prompting "Do not be malicious".
- Prompt Injection Firewall: Employs RegExp blacklists immediately rejecting known jailbreak inputs (
ignore previous). - Denial of Wallet (DoW) Limits: Strict string bounds mapping applied before the prompt touches the API.
- Session Abuse Detection: Rolling 60-second window limiting spam bot execution.
- Execution Proxy: We utilize semantic Agent dry-runs rather than exposing native
evalorexecOS vectors.
graph LR
Input[Payload] --> Bound[Length Check]
Bound --> Regex[Heuristic Reject]
Regex --> RateLimit[Abuse Track]
RateLimit --> LLM[Execution]
š Quantitative Benchmarks
Metrics context: Benchmarking executed via simulated Leetcode Hard parameters comparing zero-shot execution versus the V2 Reflection Pipeline.
| Execution Mode | Prompt Type | Pass@1 Accuracy | Latency (Avg) | Safety / Firewall |
|---|---|---|---|---|
| Standard LLM | Zero-Shot Generalized | [Insert %] | [Insert sec] | Bypassable |
| CodeMentor (V2) | Pipeline Verification | [Insert %] | [Insert sec] | Enforced |
See EVALUATION_METRICS.md for our raw execution trace outputs and methodology.
šø Presentation & Screenshots
The Timeline Dashboard

The Socratic Mock Interview

VS Code MCP Execution

Security Attack Mitigation

š Installation & Local Setup
1. Repository Clone & Environment
git clone https://github.com/yourusername/codementor-ai.git
cd codementor-ai
# Python 3.11+ is strongly recommended
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
2. Environment Variables
Copy the template and insert your GEMINI_API_KEY:
cp .env.example .env
3. Execution (Docker & Native)
Running the Web Interface (Native Streamlit):
streamlit run frontend/app.py
Running inside secure Docker containers:
docker-compose up --build
Running the MCP Server for your local IDE:
python -m mcp.server
š Project Structure
codementor-ai/
āāā agents/ # The Multi-Agent Intelligence Core
ā āāā manager_agent.py # Pipeline State-Machine Router
ā āāā reflection_agent.py # Adversarial Code Critique Component
ā āāā verification_agent.py# Code fact-checking proxy
ā āāā strategy_agent.py # Competitive Programming Guide
ā āāā (..other agents)
āāā core/ # System Integrations
ā āāā config.py # Pydantic Settings validator
ā āāā security.py # Strict Firewall & Rate Limit logic
āāā frontend/
ā āāā app.py # Glassmorphic Streamlit SaaS
āāā mcp/
ā āāā server.py # FastMCP native IDE extension bindings
āāā .env.example
āāā docker-compose.yml
āāā requirements.txt
āāā README.md
š£ļø Roadmap
- [x] Abstract initial AI logic into Pydantic structured schemas.
- [x] Create a multi-stage generator state machine (
run_pipeline). - [x] Deploy the Model Context Protocol (MCP) integrations.
- [x] Build the Memory/Abuse Security Firewall.
- [ ] Connect a true virtualized sub-process REPL (e.g., gVisor) for compilation testing.
- [ ] Implement Session History export to Cloud Storage (AWS S3/GCP).
š¤ Contributing
We welcome competitive programmers, ML researchers, and open-source contributors to the CodeMentor ecosystem!
- Fork the Project.
- Create your Feature Branch (
git checkout -b feature/AmazingAgent). - Commit your Changes (
git commit -m 'Added memory constraint agent'). - Push to the Branch (
git push origin feature/AmazingAgent). - Open a Pull Request.
Please ensure any new Agent inherits from agents.base_agent and defines a strict Pydantic Output schema.
š License
Distributed under the MIT License. See LICENSE for more information.
š Acknowledgements
- Google Gemini: For the powerhouse reasoning backing the multi-agent ensemble.
- Model Context Protocol (MCP): For the standard enabling our IDE extensibilities.
- Streamlit: For the rapid modern dashboard frontend pipeline.
- Kaggle: For catalyzing this Capstone design standard.
<div align="center"> <b>Architected for the Kaggle Capstone AI Agents competition.</b> </div>
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.