Agent Helper

Agent Helper

Enables AI agents to process files locally — OCR images, extract text from PDFs and DOCX, and describe images using local vision models, all without sending data to external services.

Category
Visit Server

README

Agent Helper

A local MCP server that gives AI agents file processing, web search, data analysis, contact lookup, local persistent memory, and system monitoring capabilities — all on your machine. OCR images, extract text from PDFs/DOCX/XLSX, detect objects with YOLOv8, describe scenes via Ollama, search the web, fetch web pages, look up people's contact info, save/recall/search memories, compress/extract archives, generate PDFs, run git operations, make HTTP requests, encrypt/decrypt data, take screenshots, query external databases, send emails, manage Docker, and more.

Architecture

                     ┌──────────────────────┐
  AI Agent (MCP) ───▶│  MCP Server :5021    │
  (opencode, etc.)   │  FastMCP / SSE       │
                     └──────────┬───────────┘
                                │
                      ┌──────────▼───────────┐
                      │  Orchestrator         │
                      │  Routes + 50+ tools   │
                      └──┬────┬────┬────┬────┘
                        │    │    │    │
                   ┌────▼┐ ┌▼───┐┌▼───┐┌▼─────┐
                   │ OCR  │ │PDF ││DOCX││ YOLO │
                   │Tesser│ │MuPDF││py- ││Obj   │
                   │act   │ │    ││docx││Detect│
                   └──────┘ └────┘└────┘└──────┘

  Browser ─────▶ Management UI :5020
                 (FastAPI dashboard)

Features

Feature Description
OCR Extract text from images via Tesseract
Object detection YOLOv8 for detecting objects in images (CPU, ~6MB model)
Scene description Optional Ollama LLaVA for image descriptions
PDF extraction Text extraction from PDFs via PyMuPDF
DOCX extraction Paragraph extraction from Word files
XLSX extraction Cell values from Excel spreadsheets via openpyxl
Web search DuckDuckGo search (free, no API key)
Web page fetch Fetch URLs, extract readable text, detects iframe content
Batch fetch Fetch multiple URLs in parallel in one call
Format converter Auto-detect & convert JSON/YAML/CSV/XML with JMESPath queries
Diff Compare two text blocks or URLs, return unified diff
RSS reader Parse RSS/Atom feeds into structured entries
Summarization Summarize text or URLs (Ollama or extractive fallback)
Date parsing Natural language dates with timezone conversion
File system List/search/read files within a configurable root path (read-only)
SQLite queries Read-only SQL queries on .db files
Archive viewer List .zip/.tar.gz contents (no extraction)
Chart generation CSV/JSON data → bar/line/pie/scatter/histogram charts (base64 PNG)
System monitoring Disk usage, memory info, running processes via psutil
Translation Translate text via Ollama
Contact search Look up people by name (phones, emails, profiles) or reverse phone lookup via DuckDuckGo + OSINT
Local memory Persistent key-value store with SQLite + FTS5 full-text search. Save, recall, search, list, delete memories
Encode / Decode Base64, UUID generation, MD5/SHA1/SHA256/SHA512 hashing
Compress / Extract Create and extract zip/tar.gz archives with zip-slip protection
PDF generation Generate PDF documents from text content via fpdf2
WHOIS / DNS Domain WHOIS lookup and DNS record queries (A, AAAA, MX, NS, CNAME, TXT, etc.)
HTTP requests Full HTTP client with SSRF protection (blocks private IPs)
Git operations Whitelisted git commands (status, log, diff, commit, push, pull, clone, etc.)
File read/write Read, write, append, delete, list files (jailed to app root directory)
Database queries Read-only PostgreSQL/MySQL/MSSQL/SQLite queries via SQLAlchemy
Send email SMTP client for sending emails
Docker management Docker ps, images, pull, run, exec, logs, inspect, stop, rm, etc. (dangerous flags blocked)
Encryption Fernet (AES) encrypt/decrypt with password-derived keys
Screenshots Capture webpage screenshots via Playwright (SSRF-guarded)
Video search Search for videos via DuckDuckGo (YouTube, Vimeo, etc.) with duration, view counts, thumbnails
Social media lookup Scrape Twitter/X profiles and tweets via Nitter — no API keys or login required
API key auth Bearer token authentication for MCP clients, managed via web UI
Management dashboard Web UI at port 5020 for settings, keys, tool toggles, job history, live logs
Tool management Enable/disable individual MCP tools from the dashboard
Job history Results cached to disk, viewable in dashboard
Live logs Poll-based log stream (no WebSocket needed)

Requirements

  • Python 3.10+
  • Tesseract OCR (system package):
    sudo apt install tesseract-ocr   # Debian/Ubuntu
    brew install tesseract           # macOS
    
  • Ollama (optional, for vision & translation):
    curl -fsSL https://ollama.com/install.sh | sh
    ollama pull llava
    

Quick start

git clone https://github.com/wajirasls/agent_helper.git
cd agent_helper

# One-shot setup:
chmod +x start.sh
./start.sh

# Or manually:
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python main.py

Open http://127.0.0.1:5020 in your browser.

Systemd service (auto-start on boot)

sudo cp agent-helper.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable agent-helper
sudo systemctl start agent-helper

Ports

Port Service Access
5020 Management UI (FastAPI) http://127.0.0.1:5020
5021 MCP Server (SSE) http://0.0.0.0:5021/sse

Management dashboard

Visit http://127.0.0.1:5020:

  • MCP Server — Start, stop, restart the MCP server
  • Vision Backend — Toggle between OCR only / Ollama LLaVA
  • Object Detection — Enable/disable YOLOv8 with confidence threshold
  • API Keys — Create and revoke keys for MCP clients
  • MCP Tools — Enable/disable individual tools
  • File System — Configure FS root path (read-only, relative to app dir)
  • Processing Folders — Browse Processing/ subfolders
  • Job History — View past processing jobs
  • Health Panel — Check Tesseract, Ollama, YOLOv8 status
  • Live Logs — Scrollable log stream

All MCP tools (50+ total)

File Processing (Processing/ folder)

Tool Parameters Description
process_folder folder_name Process all files in Processing/{folder}/. Creates folder if not found.
process_file folder_name, filename Process a single file in an existing folder.
list_folders List subfolders in Processing/.
list_files folder_name List files in a subfolder.
detect_objects_in_image folder_name, filename Run YOLOv8 detection on one image.

Direct Analysis (pass data inline, no staging needed)

Tool Parameters Description
analyze_image image_data (base64), filename OCR + object detection on base64 image.
analyze_image_url url Download image from URL → OCR + detection.
analyze_file file_data (base64), filename Analyze any supported file (PDF, DOCX, XLSX, image, text).
analyze_file_url url Download file from URL → analyze.

Web & Search

Tool Parameters Description
fetch_webpage url Fetch URL, extract readable text, detect iframes.
web_search query, max_results DuckDuckGo search (free, no API key).
batch_fetch urls (list) Fetch multiple URLs in parallel, one call.
read_feed url, max_entries Parse RSS/Atom feed into structured entries.
http_request url, method, headers, body, timeout Full HTTP client (GET/POST/PUT/DELETE). SSRF-guarded.
video_search query, max_results Search for videos via DuckDuckGo (YouTube, Vimeo, etc.). Returns title, URL, duration, view count, thumbnail.

Data & Text

Tool Parameters Description
convert_format data, from_format, to_format, query Auto-detect JSON/YAML/CSV/XML, convert, JMESPath query.
diff_text text_a, text_b, context_lines Unified diff of two text blocks.
diff_urls url_a, url_b, context_lines Fetch two URLs and diff their content.
summarize_text text, max_sentences Summarize text (Ollama or extractive TextRank).
summarize_url url, max_sentences Fetch URL + summarize in one step.
parse_date text, from_tz, to_tz Parse natural language dates, timezone conversion.
translate_text text, source_lang, target_lang Translate text via Ollama.
encode_decode operation, data, text, encoding Base64, UUID, MD5/SHA1/SHA256/SHA512 hashing.
generate_pdf text, filename, title Generate PDF from text.

Contact Info & OSINT

Tool Parameters Description
contact_search query, domain, country, max_results Look up people by name (phones, emails, social profiles) or reverse phone lookup.
whois_lookup domain WHOIS lookup for domain registration data.
dns_lookup hostname, record_type DNS queries (A, AAAA, MX, NS, CNAME, TXT, SOA, SRV).

Social Media (no API keys required)

Tool Parameters Description
social_lookup platform, username, query, nitter_instance, limit Look up Twitter/X profiles or search tweets via Nitter public instances. Returns name, bio, follower stats, recent tweets. No API key or login needed.

Local Memory (persistent, SQLite + FTS5)

Tool Parameters Description
memory_save key, content, tags Save/update a memory with key and tags (upsert).
memory_recall key Retrieve by exact key.
memory_search query, tags, limit Full-text search across all memories (FTS5 ranked).
memory_list tags, limit Browse memories, filterable by comma-separated tags.
memory_delete key Delete a memory.
memory_stats Total count, unique tags, newest/oldest entries.

File System

Tool Parameters Description
fs_list_directory path, pattern (glob) List directory contents with metadata.
fs_find_files pattern, path Recursive glob search.
fs_read_text_file path, offset, limit Read text file with line range (max 500).
fs_query_sqlite db_path, query, params Read-only SQLite query (1000 row limit).
fs_list_archive path List .zip/.tar.gz contents.
read_write_file path, content, mode, encoding Read/write/append/delete/list files (jailed to app root).

Compression

Tool Parameters Description
compress_files paths, archive_name, format Create zip/tar.gz archives. Path-traversal protected.
extract_archive archive_path, output_dir, password Extract zip/tar.gz with zip-slip protection.

Git

Tool Parameters Description
git_operation operation, args, repo_path Whitelisted git commands: status, log, diff, commit, push, pull, clone, branch, etc.

Docker

Tool Parameters Description
docker_exec action, image, command, args, timeout Docker ps, images, pull, run, exec, logs, inspect, stop, rm, stats, etc. Dangerous flags blocked.

Email & Database

Tool Parameters Description
send_email recipient, subject, body, smtp_* Send email via SMTP. Requires dashboard configuration.
database_query connection_string, query, params, max_rows Read-only SQL on PostgreSQL/MySQL/MSSQL/SQLite.

Encryption

Tool Parameters Description
crypto_encrypt text, password, algorithm Encrypt text with Fernet (AES) and password-derived key.
crypto_decrypt encrypted_data, password, algorithm, is_base64 Decrypt Fernet-encrypted data.

Screenshot

Tool Parameters Description
screenshot url, full_page, width, height Capture webpage screenshot (base64 PNG). SSRF-guarded.

Charts & System

Tool Parameters Description
create_chart data, chart_type, x_column, y_column, title CSV/JSON → bar/line/pie/scatter/histogram chart (base64 PNG).
system_disk_usage Disk usage per mount point.
system_memory_info RAM + swap usage.
system_processes limit Top processes by CPU usage.

opencode configuration

Add to your opencode.json or ~/.config/opencode/opencode.json:

{
  "$schema": "https://opencode.ai/config.json",
  "mcp": {
    "agent_helper": {
      "type": "remote",
      "url": "http://localhost:5021/sse",
      "headers": {
        "Authorization": "Bearer <your-api-key>"
      },
      "enabled": true
    }
  }
}

File processing support

Extension Processor Output
.jpg, .png, .webp, .bmp, .tiff Tesseract OCR + YOLOv8 + optional vision OCR text + detected objects + scene description
.pdf PyMuPDF Extracted text per page
.docx python-docx Extracted paragraphs
.xlsx, .xls openpyxl Cell values per sheet
.txt, .md, .csv, .json, .xml Direct read Raw file content

Vision backends

Mode Backend Notes
ocr (default) Tesseract + YOLOv8 No external service needed
ollama LLaVA via Ollama Adds scene descriptions; requires ollama serve + ollama pull llava

Project structure

agent_helper/
├── config.py                 # Settings management (persisted to JSON)
├── logger.py                 # Ring buffer logger (500 lines, polled by UI)
├── auth.py                   # API key management (SHA-256 hashed)
├── main.py                   # Entry point
├── mcp_server.py             # FastMCP server on port 5021 (50+ tools)
├── processor_orchestrator.py # All tool implementations
├── processors/
│   ├── image.py              # Tesseract OCR
│   ├── vision.py             # Ollama LLaVA image description
│   ├── pdf.py                # PyMuPDF text extraction
│   ├── docx.py               # python-docx parsing
│   ├── excel.py              # openpyxl XLSX parsing
│   ├── detection.py          # YOLOv8 object detection
│   ├── fs_tools.py           # Read-only FS operations (path-safe)
│   ├── contact_search.py     # Name/phone lookup via DDG + OSINT
│   ├── local_memory.py       # SQLite + FTS5 persistent memory
│   └── social_lookup.py      # Twitter/X profile scraping via Nitter
├── management_ui/
│   ├── app.py                # FastAPI dashboard on port 5020
│   └── templates/
│       └── dashboard.html    # HTMX dark-theme dashboard
├── Processing/               # Watch folder (created on first run)
├── local_memory/             # SQLite memory database (auto-created)
├── data/                     # Settings & API keys (persisted)
├── logs/                     # Log output
├── requirements.txt
├── start.sh
└── agent-helper.service      # systemd user service

Security

  • API key auth: All MCP tool calls require a Bearer token for /messages/ endpoints. Keys managed via dashboard.
  • SSRF protection: http_request and screenshot tools block private/internal IP addresses (127.0.0.1, 10.x, 172.x, 192.168.x, 169.254.x, localhost).
  • Path traversal: All file operations resolve paths against the allowed root and reject .. traversal.
  • Git whitelist: Only pre-approved git commands allowed (status, log, diff, commit, push, pull, etc.). No generic shell execution.
  • Docker restrictions: Dangerous flags (--privileged, --pid=host, --network=host, --cap-add=ALL) are blocked.
  • Database read-only: External DB queries enforce SELECT/WITH/PRAGMA only. Parameterized queries prevent injection.
  • Zip-slip protection: Archive extraction validates all paths against the output directory.
  • Memory validation: Keys and tags are validated with strict regex patterns. All SQL parameterized.
  • Local-only UI: Management dashboard binds to 127.0.0.1.
  • Tool management: Any tool can be disabled via the dashboard — risky ones ship disabled by default.

License

MIT

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured