dstools
Augments DeepSeek-V4 with image understanding and deep research capabilities via MCP, enabling vision and web research tools.
README
dstools — DeepSeek-V4 MCP Toolkit
Give DeepSeek-V4 models eyes and a research desk.
An MCP (Model Context Protocol) server that augments DeepSeek's text models with two capabilities they don't have natively:
- Image content understanding — DeepSeek-V4 is a text-only model.
dstoolsadds a vision tool that turns any image into rich, structured text the V4 model can reason over (leveraging its 1M-token context and world-class reasoning).- Deep Research — a multi-step, citation-backed research pipeline that uses V4 as the planning + synthesis brain over live web search and page extraction.
dstools is a productizable, installable Python package. It speaks MCP over stdio and
Streamable HTTP, so any MCP-capable host (Claude Code, Claude Desktop, Cherry Studio, a
custom agent, …) can connect a DeepSeek-V4 backend to it and immediately call these tools.
Why this exists
DeepSeek-V4 (deepseek-v4-flash / deepseek-v4-pro, released 2026-04-24) is an outstanding
text model with 1M context, strong agentic/tool-calling ability, and an automatic context
cache — but the official chat API is text-only (no multimodal vision). dstools closes
exactly that gap:
| DeepSeek-V4 strength | What's missing | What dstools adds |
|---|---|---|
| 1M context, top reasoning | Can't see images | analyze_image → vision-to-text |
| Agentic, tool-calling | No live web access | web_search, fetch_page, deep_research |
| Automatic prompt caching | — | Stable-prefix prompts to maximise cache hits |
Thinking mode (thinking={"type":"enabled"}) |
— | Used selectively for hard synthesis steps |
The toolkit is deeply adapted to V4: it defaults to deepseek-v4-pro for synthesis and
deepseek-v4-flash for cheap sub-steps, toggles V4's native thinking mode per call, structures
prompts for cache hits, and uses V4's JSON-output mode for structured extraction.
Tools exposed
| Tool | Description | Needs a key? |
|---|---|---|
analyze_image |
Describe/understand an image (path, URL, or base64). Returns structured text. | Vision provider key (or local model) |
ocr_image |
Extract text from an image (OCR). | Optional pytesseract |
web_search |
Run a web search, return ranked results (title, url, snippet). | No (DuckDuckGo, keyless) |
fetch_page |
Fetch a URL and return clean, readable Markdown. | No |
deep_research |
Full pipeline: plan → search → fetch → select → synthesize, with citations. | DeepSeek API key |
Granular tools (web_search, fetch_page, analyze_image) let the host agent run its own
agentic loop; deep_research is a one-shot orchestrator for when you just want a cited report.
Quick start
# 1. Install (Python 3.10+)
uv sync # or: pip install -e .
# 2. Configure
cp .env.example .env # then edit: set DEEPSEEK_API_KEY and a vision provider
# 3. Run the MCP server (stdio — for local hosts like Claude Code/Desktop)
uv run dstools serve
# …or over Streamable HTTP (for remote hosts)
uv run dstools serve --transport http --port 8000
Connect from Claude Code:
claude mcp add --transport stdio dstools -- uv run --directory /path/to/dstools dstools serve
A ready-made examples/claude_desktop_config.json is included for Claude Desktop.
Configuration
All settings are environment variables (.env supported). Sensible defaults mean the
keyless parts (search + fetch) work out of the box.
| Variable | Default | Purpose |
|---|---|---|
DEEPSEEK_API_KEY |
— | DeepSeek API key (required for deep_research) |
DEEPSEEK_BASE_URL |
https://api.deepseek.com |
OpenAI-compatible endpoint |
DEEPSEEK_MODEL |
deepseek-v4-pro |
Synthesis / heavy model |
DEEPSEEK_FAST_MODEL |
deepseek-v4-flash |
Cheap sub-step model |
DEEPSEEK_THINKING |
auto |
auto/on/off — V4 thinking mode for hard steps |
DEEPSEEK_REASONING_EFFORT |
high |
low/medium/high |
VISION_BASE_URL |
— | OpenAI-compatible vision endpoint (any multimodal model) |
VISION_API_KEY |
— | Key for the vision endpoint |
VISION_MODEL |
— | e.g. gpt-4o, qwen-vl-max, glm-4v, a local qwen2.5-vl via Ollama |
SEARCH_PROVIDER |
duckduckgo |
duckduckgo (keyless) / brave / tavily |
TAVILY_API_KEY |
— | Required if SEARCH_PROVIDER=tavily |
BRAVE_API_KEY |
— | Required if SEARCH_PROVIDER=brave (free 2k/mo, more reliable) |
SEARCH_RETRY_ATTEMPTS |
3 |
Retries with backoff when keyless DDG rate-limits |
RESEARCH_BREADTH |
3 |
Sub-queries generated per round |
RESEARCH_DEPTH |
2 |
Research rounds (rounds >1 trigger query refinement) |
RESEARCH_MAX_SOURCES |
8 |
Pages fetched, reranked & synthesised |
RESEARCH_{PLAN,REFINE,RERANK,SYNTH}_MODEL |
"" |
Per-step model override (empty = flash for light steps, pro for synth) |
LOG_LEVEL |
INFO |
Logging verbosity |
deep_research pipeline (v0.2)
deep_research is a smart, multi-round pipeline (DeepSeek-V4 as the brain):
- Plan (V4-flash, JSON) →
breadthsearch queries. - Round loop (
depthrounds): search → fetch → refine — V4-flash reads findings-so-far and generates next-round queries for uncovered facets. - Rerank — V4-flash extracts the passages most relevant to the question from each page (always-on; quality over raw stuffing).
- Synthesize (V4-pro + thinking) → cited markdown report.
Per-step models are tunable; set all RESEARCH_*_MODEL to deepseek-v4-flash
for the cheapest runs. dstools doctor prints a per-research cost estimate.
Vision providers (for analyze_image)
Since DeepSeek-V4 can't see images, point VISION_* at any OpenAI-compatible multimodal model:
- OpenAI:
VISION_BASE_URL=https://api.openai.com/v1,VISION_MODEL=gpt-4o/gpt-4o-mini - Alibaba Qwen-VL (DashScope, OpenAI-compat):
VISION_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1,VISION_MODEL=qwen-vl-max - Zhipu GLM-4V:
VISION_BASE_URL=https://open.bigmodel.cn/api/paas/v4,VISION_MODEL=glm-4v - Local (Ollama):
VISION_BASE_URL=http://localhost:11434/v1,VISION_MODEL=qwen2.5-vl(no key needed)
Without a vision provider, analyze_image degrades to image metadata + OCR (if pytesseract
is installed) and returns a clear note — it never crashes.
Development
uv sync --extra dev
make lint # ruff
make typecheck # mypy
make test # pytest
make serve # run the server (stdio)
Project layout
src/dstools/
server.py # FastMCP server + tool registration
cli.py # `dstools` CLI (serve / inspect / doctor)
config.py # pydantic-settings config
llm/ # DeepSeek (OpenAI-compat) + vision clients, V4 thinking-aware
search/ # pluggable search providers (DuckDuckGo default, Tavily optional)
web/ # async page fetcher + HTML→Markdown extraction
tools/ # image / search / fetch / research tools
utils/ # image I/O & encoding, text chunking
tests/ # pytest suite (network & LLM mocked)
examples/ # claude_desktop_config.json, mcp client demo
License
MIT.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.