Agent Helper
Enables AI agents to process files locally — OCR images, extract text from PDFs and DOCX, and describe images using local vision models, all without sending data to external services.
README
Agent Helper
A local MCP server that gives AI agents file processing, web search, data analysis, contact lookup, local persistent memory, and system monitoring capabilities — all on your machine. OCR images, extract text from PDFs/DOCX/XLSX, detect objects with YOLOv8, describe scenes via Ollama, search the web, fetch web pages, look up people's contact info, save/recall/search memories, compress/extract archives, generate PDFs, run git operations, make HTTP requests, encrypt/decrypt data, take screenshots, query external databases, send emails, manage Docker, and more.
Architecture
┌──────────────────────┐
AI Agent (MCP) ───▶│ MCP Server :5021 │
(opencode, etc.) │ FastMCP / SSE │
└──────────┬───────────┘
│
┌──────────▼───────────┐
│ Orchestrator │
│ Routes + 50+ tools │
└──┬────┬────┬────┬────┘
│ │ │ │
┌────▼┐ ┌▼───┐┌▼───┐┌▼─────┐
│ OCR │ │PDF ││DOCX││ YOLO │
│Tesser│ │MuPDF││py- ││Obj │
│act │ │ ││docx││Detect│
└──────┘ └────┘└────┘└──────┘
Browser ─────▶ Management UI :5020
(FastAPI dashboard)
Features
| Feature | Description |
|---|---|
| OCR | Extract text from images via Tesseract |
| Object detection | YOLOv8 for detecting objects in images (CPU, ~6MB model) |
| Scene description | Optional Ollama LLaVA for image descriptions |
| PDF extraction | Text extraction from PDFs via PyMuPDF |
| DOCX extraction | Paragraph extraction from Word files |
| XLSX extraction | Cell values from Excel spreadsheets via openpyxl |
| Web search | DuckDuckGo search (free, no API key) |
| Web page fetch | Fetch URLs, extract readable text, detects iframe content |
| Batch fetch | Fetch multiple URLs in parallel in one call |
| Format converter | Auto-detect & convert JSON/YAML/CSV/XML with JMESPath queries |
| Diff | Compare two text blocks or URLs, return unified diff |
| RSS reader | Parse RSS/Atom feeds into structured entries |
| Summarization | Summarize text or URLs (Ollama or extractive fallback) |
| Date parsing | Natural language dates with timezone conversion |
| File system | List/search/read files within a configurable root path (read-only) |
| SQLite queries | Read-only SQL queries on .db files |
| Archive viewer | List .zip/.tar.gz contents (no extraction) |
| Chart generation | CSV/JSON data → bar/line/pie/scatter/histogram charts (base64 PNG) |
| System monitoring | Disk usage, memory info, running processes via psutil |
| Translation | Translate text via Ollama |
| Contact search | Look up people by name (phones, emails, profiles) or reverse phone lookup via DuckDuckGo + OSINT |
| Local memory | Persistent key-value store with SQLite + FTS5 full-text search. Save, recall, search, list, delete memories |
| Encode / Decode | Base64, UUID generation, MD5/SHA1/SHA256/SHA512 hashing |
| Compress / Extract | Create and extract zip/tar.gz archives with zip-slip protection |
| PDF generation | Generate PDF documents from text content via fpdf2 |
| WHOIS / DNS | Domain WHOIS lookup and DNS record queries (A, AAAA, MX, NS, CNAME, TXT, etc.) |
| HTTP requests | Full HTTP client with SSRF protection (blocks private IPs) |
| Git operations | Whitelisted git commands (status, log, diff, commit, push, pull, clone, etc.) |
| File read/write | Read, write, append, delete, list files (jailed to app root directory) |
| Database queries | Read-only PostgreSQL/MySQL/MSSQL/SQLite queries via SQLAlchemy |
| Send email | SMTP client for sending emails |
| Docker management | Docker ps, images, pull, run, exec, logs, inspect, stop, rm, etc. (dangerous flags blocked) |
| Encryption | Fernet (AES) encrypt/decrypt with password-derived keys |
| Screenshots | Capture webpage screenshots via Playwright (SSRF-guarded) |
| Video search | Search for videos via DuckDuckGo (YouTube, Vimeo, etc.) with duration, view counts, thumbnails |
| Social media lookup | Scrape Twitter/X profiles and tweets via Nitter — no API keys or login required |
| API key auth | Bearer token authentication for MCP clients, managed via web UI |
| Management dashboard | Web UI at port 5020 for settings, keys, tool toggles, job history, live logs |
| Tool management | Enable/disable individual MCP tools from the dashboard |
| Job history | Results cached to disk, viewable in dashboard |
| Live logs | Poll-based log stream (no WebSocket needed) |
Requirements
- Python 3.10+
- Tesseract OCR (system package):
sudo apt install tesseract-ocr # Debian/Ubuntu brew install tesseract # macOS - Ollama (optional, for vision & translation):
curl -fsSL https://ollama.com/install.sh | sh ollama pull llava
Quick start
git clone https://github.com/wajirasls/agent_helper.git
cd agent_helper
# One-shot setup:
chmod +x start.sh
./start.sh
# Or manually:
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python main.py
Open http://127.0.0.1:5020 in your browser.
Systemd service (auto-start on boot)
sudo cp agent-helper.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable agent-helper
sudo systemctl start agent-helper
Ports
| Port | Service | Access |
|---|---|---|
| 5020 | Management UI (FastAPI) | http://127.0.0.1:5020 |
| 5021 | MCP Server (SSE) | http://0.0.0.0:5021/sse |
Management dashboard
Visit http://127.0.0.1:5020:
- MCP Server — Start, stop, restart the MCP server
- Vision Backend — Toggle between OCR only / Ollama LLaVA
- Object Detection — Enable/disable YOLOv8 with confidence threshold
- API Keys — Create and revoke keys for MCP clients
- MCP Tools — Enable/disable individual tools
- File System — Configure FS root path (read-only, relative to app dir)
- Processing Folders — Browse
Processing/subfolders - Job History — View past processing jobs
- Health Panel — Check Tesseract, Ollama, YOLOv8 status
- Live Logs — Scrollable log stream
All MCP tools (50+ total)
File Processing (Processing/ folder)
| Tool | Parameters | Description |
|---|---|---|
process_folder |
folder_name |
Process all files in Processing/{folder}/. Creates folder if not found. |
process_file |
folder_name, filename |
Process a single file in an existing folder. |
list_folders |
— | List subfolders in Processing/. |
list_files |
folder_name |
List files in a subfolder. |
detect_objects_in_image |
folder_name, filename |
Run YOLOv8 detection on one image. |
Direct Analysis (pass data inline, no staging needed)
| Tool | Parameters | Description |
|---|---|---|
analyze_image |
image_data (base64), filename |
OCR + object detection on base64 image. |
analyze_image_url |
url |
Download image from URL → OCR + detection. |
analyze_file |
file_data (base64), filename |
Analyze any supported file (PDF, DOCX, XLSX, image, text). |
analyze_file_url |
url |
Download file from URL → analyze. |
Web & Search
| Tool | Parameters | Description |
|---|---|---|
fetch_webpage |
url |
Fetch URL, extract readable text, detect iframes. |
web_search |
query, max_results |
DuckDuckGo search (free, no API key). |
batch_fetch |
urls (list) |
Fetch multiple URLs in parallel, one call. |
read_feed |
url, max_entries |
Parse RSS/Atom feed into structured entries. |
http_request |
url, method, headers, body, timeout |
Full HTTP client (GET/POST/PUT/DELETE). SSRF-guarded. |
video_search |
query, max_results |
Search for videos via DuckDuckGo (YouTube, Vimeo, etc.). Returns title, URL, duration, view count, thumbnail. |
Data & Text
| Tool | Parameters | Description |
|---|---|---|
convert_format |
data, from_format, to_format, query |
Auto-detect JSON/YAML/CSV/XML, convert, JMESPath query. |
diff_text |
text_a, text_b, context_lines |
Unified diff of two text blocks. |
diff_urls |
url_a, url_b, context_lines |
Fetch two URLs and diff their content. |
summarize_text |
text, max_sentences |
Summarize text (Ollama or extractive TextRank). |
summarize_url |
url, max_sentences |
Fetch URL + summarize in one step. |
parse_date |
text, from_tz, to_tz |
Parse natural language dates, timezone conversion. |
translate_text |
text, source_lang, target_lang |
Translate text via Ollama. |
encode_decode |
operation, data, text, encoding |
Base64, UUID, MD5/SHA1/SHA256/SHA512 hashing. |
generate_pdf |
text, filename, title |
Generate PDF from text. |
Contact Info & OSINT
| Tool | Parameters | Description |
|---|---|---|
contact_search |
query, domain, country, max_results |
Look up people by name (phones, emails, social profiles) or reverse phone lookup. |
whois_lookup |
domain |
WHOIS lookup for domain registration data. |
dns_lookup |
hostname, record_type |
DNS queries (A, AAAA, MX, NS, CNAME, TXT, SOA, SRV). |
Social Media (no API keys required)
| Tool | Parameters | Description |
|---|---|---|
social_lookup |
platform, username, query, nitter_instance, limit |
Look up Twitter/X profiles or search tweets via Nitter public instances. Returns name, bio, follower stats, recent tweets. No API key or login needed. |
Local Memory (persistent, SQLite + FTS5)
| Tool | Parameters | Description |
|---|---|---|
memory_save |
key, content, tags |
Save/update a memory with key and tags (upsert). |
memory_recall |
key |
Retrieve by exact key. |
memory_search |
query, tags, limit |
Full-text search across all memories (FTS5 ranked). |
memory_list |
tags, limit |
Browse memories, filterable by comma-separated tags. |
memory_delete |
key |
Delete a memory. |
memory_stats |
— | Total count, unique tags, newest/oldest entries. |
File System
| Tool | Parameters | Description |
|---|---|---|
fs_list_directory |
path, pattern (glob) |
List directory contents with metadata. |
fs_find_files |
pattern, path |
Recursive glob search. |
fs_read_text_file |
path, offset, limit |
Read text file with line range (max 500). |
fs_query_sqlite |
db_path, query, params |
Read-only SQLite query (1000 row limit). |
fs_list_archive |
path |
List .zip/.tar.gz contents. |
read_write_file |
path, content, mode, encoding |
Read/write/append/delete/list files (jailed to app root). |
Compression
| Tool | Parameters | Description |
|---|---|---|
compress_files |
paths, archive_name, format |
Create zip/tar.gz archives. Path-traversal protected. |
extract_archive |
archive_path, output_dir, password |
Extract zip/tar.gz with zip-slip protection. |
Git
| Tool | Parameters | Description |
|---|---|---|
git_operation |
operation, args, repo_path |
Whitelisted git commands: status, log, diff, commit, push, pull, clone, branch, etc. |
Docker
| Tool | Parameters | Description |
|---|---|---|
docker_exec |
action, image, command, args, timeout |
Docker ps, images, pull, run, exec, logs, inspect, stop, rm, stats, etc. Dangerous flags blocked. |
Email & Database
| Tool | Parameters | Description |
|---|---|---|
send_email |
recipient, subject, body, smtp_* |
Send email via SMTP. Requires dashboard configuration. |
database_query |
connection_string, query, params, max_rows |
Read-only SQL on PostgreSQL/MySQL/MSSQL/SQLite. |
Encryption
| Tool | Parameters | Description |
|---|---|---|
crypto_encrypt |
text, password, algorithm |
Encrypt text with Fernet (AES) and password-derived key. |
crypto_decrypt |
encrypted_data, password, algorithm, is_base64 |
Decrypt Fernet-encrypted data. |
Screenshot
| Tool | Parameters | Description |
|---|---|---|
screenshot |
url, full_page, width, height |
Capture webpage screenshot (base64 PNG). SSRF-guarded. |
Charts & System
| Tool | Parameters | Description |
|---|---|---|
create_chart |
data, chart_type, x_column, y_column, title |
CSV/JSON → bar/line/pie/scatter/histogram chart (base64 PNG). |
system_disk_usage |
— | Disk usage per mount point. |
system_memory_info |
— | RAM + swap usage. |
system_processes |
limit |
Top processes by CPU usage. |
opencode configuration
Add to your opencode.json or ~/.config/opencode/opencode.json:
{
"$schema": "https://opencode.ai/config.json",
"mcp": {
"agent_helper": {
"type": "remote",
"url": "http://localhost:5021/sse",
"headers": {
"Authorization": "Bearer <your-api-key>"
},
"enabled": true
}
}
}
File processing support
| Extension | Processor | Output |
|---|---|---|
.jpg, .png, .webp, .bmp, .tiff |
Tesseract OCR + YOLOv8 + optional vision | OCR text + detected objects + scene description |
.pdf |
PyMuPDF | Extracted text per page |
.docx |
python-docx | Extracted paragraphs |
.xlsx, .xls |
openpyxl | Cell values per sheet |
.txt, .md, .csv, .json, .xml |
Direct read | Raw file content |
Vision backends
| Mode | Backend | Notes |
|---|---|---|
ocr (default) |
Tesseract + YOLOv8 | No external service needed |
ollama |
LLaVA via Ollama | Adds scene descriptions; requires ollama serve + ollama pull llava |
Project structure
agent_helper/
├── config.py # Settings management (persisted to JSON)
├── logger.py # Ring buffer logger (500 lines, polled by UI)
├── auth.py # API key management (SHA-256 hashed)
├── main.py # Entry point
├── mcp_server.py # FastMCP server on port 5021 (50+ tools)
├── processor_orchestrator.py # All tool implementations
├── processors/
│ ├── image.py # Tesseract OCR
│ ├── vision.py # Ollama LLaVA image description
│ ├── pdf.py # PyMuPDF text extraction
│ ├── docx.py # python-docx parsing
│ ├── excel.py # openpyxl XLSX parsing
│ ├── detection.py # YOLOv8 object detection
│ ├── fs_tools.py # Read-only FS operations (path-safe)
│ ├── contact_search.py # Name/phone lookup via DDG + OSINT
│ ├── local_memory.py # SQLite + FTS5 persistent memory
│ └── social_lookup.py # Twitter/X profile scraping via Nitter
├── management_ui/
│ ├── app.py # FastAPI dashboard on port 5020
│ └── templates/
│ └── dashboard.html # HTMX dark-theme dashboard
├── Processing/ # Watch folder (created on first run)
├── local_memory/ # SQLite memory database (auto-created)
├── data/ # Settings & API keys (persisted)
├── logs/ # Log output
├── requirements.txt
├── start.sh
└── agent-helper.service # systemd user service
Security
- API key auth: All MCP tool calls require a Bearer token for
/messages/endpoints. Keys managed via dashboard. - SSRF protection:
http_requestandscreenshottools block private/internal IP addresses (127.0.0.1, 10.x, 172.x, 192.168.x, 169.254.x, localhost). - Path traversal: All file operations resolve paths against the allowed root and reject
..traversal. - Git whitelist: Only pre-approved git commands allowed (status, log, diff, commit, push, pull, etc.). No generic shell execution.
- Docker restrictions: Dangerous flags (
--privileged,--pid=host,--network=host,--cap-add=ALL) are blocked. - Database read-only: External DB queries enforce SELECT/WITH/PRAGMA only. Parameterized queries prevent injection.
- Zip-slip protection: Archive extraction validates all paths against the output directory.
- Memory validation: Keys and tags are validated with strict regex patterns. All SQL parameterized.
- Local-only UI: Management dashboard binds to
127.0.0.1. - Tool management: Any tool can be disabled via the dashboard — risky ones ship disabled by default.
License
MIT
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.