Saiten MCP Server
Automates the evaluation of GitHub hackathon submissions by providing tools to collect issue data, apply scoring rubrics, and generate ranking reports. It enables a multi-agent system to manage scoring consistency, statistical outlier detection, and feedback generation through the GitHub CLI.
README
Saiten ā Agents League @ TechConnect Scoring Agent
Submission Track: šØ Creative Apps ā GitHub Copilot
Overview
A multi-agent system that automatically scores all Agents League @ TechConnect hackathon submissions and generates ranking reports ā just type @saiten-orchestrator score all in VS Code.
Designed with Orchestrator-Workers + Prompt Chaining + Evaluator-Optimizer patterns, 6 Copilot custom agents autonomously collect GitHub Issue submissions, evaluate them against track-specific rubrics, validate scoring consistency, and generate reports via an MCP (Model Context Protocol) server.
Two-Phase Scoring
Scoring uses a mechanical baseline + AI qualitative review pipeline:
| Phase | What | How | Judges |
|---|---|---|---|
| Phase A: Baseline | scripts/score_all.py |
Keyword matching, checklist ratios, README section counts, demo detection | "Does the README mention MCP?" "How many checklist items are checked?" |
| Phase B: AI Review | @saiten-scorer agent |
Copilot reads each submission, assesses quality holistically, adjusts scores via adjust_scores() |
"Is this genuinely novel or a tutorial wrapper?" "Does implementation depth match the score?" |
The baseline is fast and deterministic but shallow. The AI review adds the qualitative depth that only comes from actually reading and understanding each project.
Agent Workflow
Design Patterns
- Orchestrator-Workers:
@saiten-orchestratordelegates to 5 specialized sub-agents - Prompt Chaining: Collect ā Score ā Review ā Report with Gates at each step
- Evaluator-Optimizer: Reviewer validates scores, triggers re-scoring on FLAG
- Handoff: Commenter posts feedback only after explicit user confirmation
- SRP (Single Responsibility Principle): 1 agent = 1 responsibility
Reasoning Patterns
- Chain-of-Thought (CoT): Scorer evaluates each criterion sequentially, building evidence chain before calculating weighted total
- Evaluator-Optimizer Loop: Reviewer detects 5 bias types (central tendency, halo effect, leniency, range restriction, anchoring) ā FLAGs ā Scorer re-evaluates with specific guidance ā max 2 cycles
- Gate-based Error Recovery: Each workflow step has a validation gate; failures trigger graceful degradation (skip + warn) rather than hard stops
- Evidence-Anchored Scoring: Rubrics define explicit
evidence_signals(positive/negative) per criterion; scorers must cite signals from actual submission content - Two-Phase Scoring: Mechanical baseline extracts signals deterministically; Copilot agent then reviews qualitatively and adjusts scores with rationale via
adjust_scores()
Reliability Features
- Exponential Backoff Retry: gh CLI calls retry up to 3 times on rate limits (429) and server errors (5xx) with exponential delay
- Rate Limiting: Sliding-window rate limiter (30 calls/60s per tool) prevents GitHub API abuse
- Input Validation: All MCP tool inputs validated at boundaries (Fail Fast) ā scores 1-10, weighted_total 0-100, required fields checked
- Corrupted Data Recovery:
scores.jsonauto-backed up on parse failure, server continues with empty store - Idempotent Operations: Re-scoring safely overwrites existing entries by
issue_numberkey
Workflow Diagram
flowchart TD
User["š¤ User\n@saiten-orchestrator score all"]
subgraph Orchestrator["š @saiten-orchestrator"]
Route["Intent Routing\nUC-01~06"]
Gate1{"Gate: MCP\nConnectivity"}
Gate2{"Gate: Data\nCompleteness"}
Gate3{"Gate: Score\nValidity"}
Gate4{"Gate: Review\nPASS/FLAG?"}
Integrate["Result Integration\n& User Report"]
Handoff["[Handoff]\nš¬ Post Feedback"]
end
subgraph Collector["š„ @saiten-collector"]
C1["list_submissions()"]
C2["get_submission_detail()"]
C3["Data Validation"]
end
subgraph Scorer["š @saiten-scorer"]
S1["get_scoring_rubric()"]
S2["Rubric-based Evaluation\n1-10 score per criterion"]
S3["Quality Self-Check"]
S4["save_scores()"]
end
subgraph Reviewer["š @saiten-reviewer"]
V1["Load scores.json"]
V2["Statistical Outlier\nDetection (2Ļ)"]
V3["Rubric Consistency\nCheck"]
V4["Bias Detection"]
end
subgraph Reporter["š @saiten-reporter"]
R1["generate_ranking_report()"]
R2["Trend Analysis"]
R3["Report Validation"]
end
subgraph Commenter["š¬ @saiten-commenter"]
CM1["Generate Comment\nper Top N"]
CM2["User Confirmation\n(Human-in-the-Loop)"]
CM3["gh issue comment"]
end
subgraph MCP["ā” saiten-mcp (FastMCP Server)"]
T1["list_submissions"]
T2["get_submission_detail"]
T3["get_scoring_rubric"]
T4["save_scores"]
T5["generate_ranking_report"]
end
subgraph External["External"]
GH["GitHub API\n(gh CLI)"]
FS["Local Storage\ndata/ & reports/"]
end
User --> Route
Route --> Gate1
Gate1 -->|OK| Collector
Gate1 -->|FAIL| User
C1 --> C2 --> C3
C3 --> Gate2
Gate2 -->|OK| Baseline
Gate2 -->|"ā ļø Skip"| Integrate
subgraph Baseline["āļø Mechanical Baseline"]
B1["score_all.py\nKeyword matching\nChecklist ratios"]
end
B1 --> Scorer
S1 --> S2 --> S3
S3 -->|PASS| S4
S3 -->|"FAIL: Re-evaluate"| S2
S4 --> S5["adjust_scores()\nAI Qualitative Review"]
S5 --> Gate3
Gate3 -->|OK| Reviewer
V1 --> V2 --> V3 --> V4
V4 --> Gate4
Gate4 -->|PASS| Reporter
Gate4 -->|"FLAG: Re-score"| Scorer
R1 --> R2 --> R3
R3 --> Integrate --> User
Integrate --> Handoff
Handoff -->|"User clicks"| Commenter
CM1 --> CM2 --> CM3
Collector -.->|MCP| T1 & T2
Scorer -.->|MCP| T3 & T4
Reporter -.->|MCP| T5
T1 & T2 -.-> GH
T4 & T5 -.-> FS
CM3 -.-> GH
style Orchestrator fill:#1a1a2e,stroke:#e94560,color:#fff
style Collector fill:#16213e,stroke:#0f3460,color:#fff
style Scorer fill:#16213e,stroke:#0f3460,color:#fff
style Reviewer fill:#1a1a2e,stroke:#e94560,color:#fff
style Reporter fill:#16213e,stroke:#0f3460,color:#fff
style Commenter fill:#0f3460,stroke:#533483,color:#fff
style MCP fill:#0f3460,stroke:#533483,color:#fff
Agent Roster
| Agent | Role | SRP Responsibility | MCP Tools |
|---|---|---|---|
š @saiten-orchestrator |
Orchestrator | Intent routing, delegation, result integration | ā (delegates all) |
š„ @saiten-collector |
Worker | GitHub Issue data collection & validation | list_submissions, get_submission_detail |
š @saiten-scorer |
Worker | Two-phase scoring: baseline signals + AI qualitative review | get_scoring_rubric, save_scores, adjust_scores |
š @saiten-reviewer |
Evaluator | Score consistency review & bias detection | get_scoring_rubric, read scores |
š @saiten-reporter |
Worker | Ranking report generation & trend analysis | generate_ranking_report |
š¬ @saiten-commenter |
Handoff | GitHub Issue feedback comments (user-confirmed) | gh issue comment |
Design Principles Applied
| Principle | How Applied |
|---|---|
| SRP | Each agent handles exactly 1 responsibility (6 agents Ć 1 duty) |
| Fail Fast | Gates at every step; anomalies reported immediately |
| SSOT | All score data centralized in data/scores.json |
| Feedback Loop | Scorer ā Reviewer ā Re-score loop (Evaluator-Optimizer pattern) |
| Human-in-the-Loop | Commenter runs only after explicit user confirmation via Handoff |
| Transparency | Todo list shows progress; each Gate reports status |
| Idempotency | Re-scoring overwrites; safe to run multiple times |
| ISP | Each sub-agent receives only the tools and data it needs |
System Architecture
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā VS Code ā
ā ā
ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā ā
ā ā š @saiten-orchestrator ā ā
ā ā āāā š„ @saiten-collector (Worker) ā ā
ā ā āāā š @saiten-scorer (Worker) ā ā
ā ā āāā š @saiten-reviewer (Evaluator) ā ā
ā ā āāā š @saiten-reporter (Worker) ā ā
ā ā āāā š¬ @saiten-commenter (Handoff) ā ā
ā āāāāāāāāāāāāāāāā¬āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā ā
ā ā MCP (stdio) ā
ā āāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā ā
ā ā ā” saiten-mcp (FastMCP Server / Python) ā ā
ā ā ā list_submissions() ā gh CLI ā GitHub ā ā
ā ā ā get_submission_detail() ā gh CLI ā GitHub ā ā
ā ā ā get_scoring_rubric() ā YAML files ā ā
ā ā ā save_scores() ā data/scores.json ā ā
ā ā ā adjust_scores() ā data/scores.json ā ā
ā ā ā generate_ranking_report() ā reports/*.md ā ā
ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
Setup
Prerequisites
Installation
# Clone the repository
git clone <repo-url>
cd FY26_techconnect_saiten
# Create Python virtual environment
uv venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
# Install dependencies (production)
uv pip install -e .
# Install development dependencies (includes pytest + coverage)
uv pip install -e ".[dev]"
# Verify gh CLI authentication
gh auth status
Environment Variables
No secrets are required for normal operation.
# Copy the template (optional ā only needed for CI or non-VS Code environments)
cp .env.example .env
| Variable | Required | Description |
|---|---|---|
GITHUB_TOKEN |
No | gh CLI manages its own auth. Only set for CI environments |
Security: This project uses
gh CLIauthentication and VS Code Copilot's built-in Azure OpenAI credentials. No API keys are stored in code or config files.
VS Code Configuration
.vscode/mcp.json automatically configures the MCP server. No additional setup required.
Usage
Type the following in the VS Code chat panel:
| Command | Description | Agents Used |
|---|---|---|
@saiten-orchestrator score all |
Score all submissions | collector ā baseline ā scorer (AI) ā reviewer ā reporter |
@saiten-orchestrator score #48 |
Score a single submission | collector ā scorer ā reviewer ā reporter |
@saiten-orchestrator ranking |
Generate ranking report | reporter only |
@saiten-orchestrator rescore #48 |
Re-score a submission | collector ā scorer ā reviewer ā reporter |
@saiten-orchestrator show rubric for Creative |
Display scoring rubric | Direct response (MCP) |
@saiten-orchestrator review scores |
Review score consistency | reviewer only |
Project Structure
FY26_techconnect_saiten/
āāā .github/agents/
ā āāā saiten-orchestrator.agent.md # š Orchestrator
ā āāā saiten-collector.agent.md # š„ Data Collection Worker
ā āāā saiten-scorer.agent.md # š Scoring Worker
ā āāā saiten-reviewer.agent.md # š Score Reviewer (Evaluator)
ā āāā saiten-reporter.agent.md # š Report Worker
ā āāā saiten-commenter.agent.md # š¬ Feedback Commenter (Handoff)
āāā src/saiten_mcp/
ā āāā server.py # MCP Server + rate limiter + structured logging
ā āāā models.py # Pydantic data models with boundary validation
ā āāā tools/
ā āāā submissions.py # list_submissions, get_submission_detail
ā āāā rubrics.py # get_scoring_rubric
ā āāā scores.py # save_scores, adjust_scores
ā āāā reports.py # generate_ranking_report
āāā data/
ā āāā rubrics/ # Track-specific scoring rubrics (YAML)
ā āāā scores.json # Scoring results (SSOT)
āāā reports/
ā āāā ranking.md # Auto-generated ranking report
āāā scripts/
ā āāā score_all.py # Phase A: Mechanical baseline scoring
ā āāā run_scoring.py # CLI scoring pipeline (legacy)
āāā tests/
ā āāā conftest.py # Shared test fixtures
ā āāā test_models.py # Pydantic model validation tests
ā āāā test_parsers.py # Issue body parser tests
ā āāā test_rubrics.py # Rubric YAML integrity tests
ā āāā test_scores.py # Score persistence & validation tests
ā āāā test_reports.py # Report generation tests
ā āāā test_reliability.py # Retry, rate limiting, error handling tests
ā āāā test_e2e.py # E2E integration tests
āāā .vscode/mcp.json # MCP server config
āāā AGENTS.md # Agent registry
āāā pyproject.toml
Testing
The project has a comprehensive test suite with 110 tests covering models, parsers, tools, reliability, and reports.
# Run all tests
python -m pytest tests/ -v
# Run with coverage report
python -m pytest tests/ --cov=saiten_mcp --cov-report=term-missing
# Run only unit tests (no network calls)
python -m pytest tests/ -m "not e2e" -v
# Run integration tests (requires gh CLI auth)
python -m pytest tests/ -m e2e -v
Test Structure
| Test File | Tests | What It Covers |
|---|---|---|
test_models.py |
17 | Pydantic models, validation boundaries, evidence-anchored fields |
test_parsers.py |
28 | Issue body parsing, track detection, URL extraction, checklists |
test_rubrics.py |
20 | Rubric YAML integrity, weights, scoring policy, evidence signals |
test_scores.py |
9 | Score persistence, idempotency, input validation, sorting |
test_reports.py |
8 | Markdown report generation, empty/missing data edge cases |
test_reliability.py |
10 | Retry logic, rate limiting, error handling, gh CLI resilience |
test_e2e.py |
5 | End-to-end MCP tool calls with live GitHub data |
| Total | 110 | 88% code coverage |
Scoring Tracks
| Track | Criteria | Notes |
|---|---|---|
| šØ Creative Apps | 5 criteria | Community Vote (10%) excluded; remaining 90% prorated to 100% |
| š§ Reasoning Agents | 5 criteria | Uses common overall criteria |
| š¼ Enterprise Agents | 3 criteria | Custom 3-axis evaluation |
Demo
The multi-agent workflow can be invoked directly from VS Code's chat panel:
Scoring a Single Submission
š¤ User: @saiten-orchestrator score #49
š @saiten-orchestrator ā Routes to collector ā scorer ā reviewer ā reporter
š„ @saiten-collector: Fetched Issue #49 (EasyExpenseAI)
āā Track: Creative Apps
āā Repo: github.com/chakras/Easy-Expense-AI
āā README: 10,036 chars extracted
āā Gate: ā
Data complete
š @saiten-scorer: Evidence-anchored evaluation
āā Accuracy & Relevance: 8/10
ā Evidence: "5-agent Semantic Kernel pipeline with Azure Document Intelligence"
āā Reasoning: 7/10
ā Evidence: "Linear pipeline, no self-correction loop"
āā Total: 73.9/100
āā Gate: ā
All criteria scored with evidence
š @saiten-reviewer: Bias check passed
āā Outlier check: PASS (within 2Ļ)
āā Evidence quality: PASS (no generic phrases)
āā Gate: ā
PASS
š @saiten-reporter: Report saved ā reports/ranking.md
Scoring All Submissions
š¤ User: @saiten-orchestrator score all
š @saiten-orchestrator: Processing 43 submissions across 3 tracks...
āā š„ Collecting ā š Scoring ā š Reviewing ā š Reporting
āā Progress tracked via Todo list
āā Final report: reports/ranking.md
Key Differentiators
- Evidence-anchored scoring: Each criterion requires specific evidence from the submission, not generic phrases
- Self-correction loop: Reviewer FLAGs biased scores ā Scorer re-evaluates ā until PASS
- Real-time progress: Todo list updates visible in VS Code during multi-submission scoring
- Human-in-the-loop: Feedback comments only posted after explicit user confirmation via Handoff
Troubleshooting
| Issue | Cause | Solution |
|---|---|---|
gh command failed |
gh CLI not authenticated | Run gh auth login |
scores.json corrupted |
Interrupted write | Auto-restored from .json.bak backup |
ValueError: issue_number must be positive |
Bad input to save_scores |
Check score data format matches schema |
Invalid track name |
Typo in track parameter | Use: creative-apps, reasoning-agents, or enterprise-agents |
| MCP server not starting | Python env mismatch | Ensure uv pip install -e . in the .venv |
| No submissions returned | Network or auth issue | Run gh api repos/microsoft/agentsleague-techconnect/issues --jq '.[0].number' to test |
Corrupted Data Recovery
If data/scores.json becomes corrupted, the server automatically:
- Logs a warning with the parse error
- Creates a backup at
data/scores.json.bak - Continues with an empty score store
To restore manually:
cp data/scores.json.bak data/scores.json
Tech Stack
| Layer | Technology |
|---|---|
| Agent Framework | VS Code Copilot Custom Agent (.agent.md) ā Orchestrator-Workers pattern |
| MCP Server | Python 3.10+ / FastMCP (stdio transport) |
| Package Manager | uv |
| GitHub Integration | gh CLI / GitHub REST API with exponential backoff retry and rate limiting |
| Data Models | Pydantic v2 with boundary validation (scores 1-10, weighted_total 0-100) |
| Data Storage | JSON (scores) / YAML (rubrics) / Markdown (reports) with backup & recovery |
| Testing | pytest + pytest-cov ā 110 tests, 88% coverage |
| Error Handling | Retry with backoff, rate limiting, input validation, corrupted file recovery |
Key Technologies Built by the Team
This project was enabled by several open-source tools created by the same team:
| Tool | Description | Repo |
|---|---|---|
| Agent Skill Ninja | VS Code extension for searching, installing, and managing Agent Skills (SKILL.md) for GitHub Copilot, Claude Code, and other AI coding assistants. Used to install the agentic-workflow-guide skill into this project. |
vscode-agent-skill-ninja |
| Agentic Workflow Guide | A comprehensive Agent Skill covering 5 workflow patterns, agent delegation, Handoffs, and Context Engineering. The design principles (SSOT, SRP, Fail Fast) and workflow patterns (Orchestrator-Workers, Evaluator-Optimizer) used in Saiten all come from this skill. | Agent-Skills |
How Agent Skill Ninja Powers This Project
1. Install Agent Skill Ninja extension in VS Code
2. Search for "agentic-workflow-guide" skill
3. One-click install ā SKILL.md + references/ + templates/ added to .github/skills/
4. Copilot now has domain knowledge about workflow patterns & agent design
5. Use that knowledge to design, review, and build the 6 Saiten agents
The agentic-workflow-guide skill provides:
- 5 workflow pattern references (Prompt Chaining, Routing, Parallelization, Orchestrator-Workers, Evaluator-Optimizer)
- Agent templates with SRP, Gate, and Handoff patterns
- Design principles (SSOT, Fail Fast, Feedback Loop, Human-in-the-Loop)
- Review checklist for validating agent architecture
- Scaffold script for generating new agent files
License
MIT
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.