dstools

dstools

Augments DeepSeek-V4 with image understanding and deep research capabilities via MCP, enabling vision and web research tools.

Category
Visit Server

README

dstools — DeepSeek-V4 MCP Toolkit

Give DeepSeek-V4 models eyes and a research desk.

An MCP (Model Context Protocol) server that augments DeepSeek's text models with two capabilities they don't have natively:

  1. Image content understanding — DeepSeek-V4 is a text-only model. dstools adds a vision tool that turns any image into rich, structured text the V4 model can reason over (leveraging its 1M-token context and world-class reasoning).
  2. Deep Research — a multi-step, citation-backed research pipeline that uses V4 as the planning + synthesis brain over live web search and page extraction.

dstools is a productizable, installable Python package. It speaks MCP over stdio and Streamable HTTP, so any MCP-capable host (Claude Code, Claude Desktop, Cherry Studio, a custom agent, …) can connect a DeepSeek-V4 backend to it and immediately call these tools.


Why this exists

DeepSeek-V4 (deepseek-v4-flash / deepseek-v4-pro, released 2026-04-24) is an outstanding text model with 1M context, strong agentic/tool-calling ability, and an automatic context cache — but the official chat API is text-only (no multimodal vision). dstools closes exactly that gap:

DeepSeek-V4 strength What's missing What dstools adds
1M context, top reasoning Can't see images analyze_image → vision-to-text
Agentic, tool-calling No live web access web_search, fetch_page, deep_research
Automatic prompt caching Stable-prefix prompts to maximise cache hits
Thinking mode (thinking={"type":"enabled"}) Used selectively for hard synthesis steps

The toolkit is deeply adapted to V4: it defaults to deepseek-v4-pro for synthesis and deepseek-v4-flash for cheap sub-steps, toggles V4's native thinking mode per call, structures prompts for cache hits, and uses V4's JSON-output mode for structured extraction.

Tools exposed

Tool Description Needs a key?
analyze_image Describe/understand an image (path, URL, or base64). Returns structured text. Vision provider key (or local model)
ocr_image Extract text from an image (OCR). Optional pytesseract
web_search Run a web search, return ranked results (title, url, snippet). No (DuckDuckGo, keyless)
fetch_page Fetch a URL and return clean, readable Markdown. No
deep_research Full pipeline: plan → search → fetch → select → synthesize, with citations. DeepSeek API key

Granular tools (web_search, fetch_page, analyze_image) let the host agent run its own agentic loop; deep_research is a one-shot orchestrator for when you just want a cited report.

Quick start

# 1. Install (Python 3.10+)
uv sync                # or: pip install -e .

# 2. Configure
cp .env.example .env   # then edit: set DEEPSEEK_API_KEY and a vision provider

# 3. Run the MCP server (stdio — for local hosts like Claude Code/Desktop)
uv run dstools serve

# …or over Streamable HTTP (for remote hosts)
uv run dstools serve --transport http --port 8000

Connect from Claude Code:

claude mcp add --transport stdio dstools -- uv run --directory /path/to/dstools dstools serve

A ready-made examples/claude_desktop_config.json is included for Claude Desktop.

Configuration

All settings are environment variables (.env supported). Sensible defaults mean the keyless parts (search + fetch) work out of the box.

Variable Default Purpose
DEEPSEEK_API_KEY DeepSeek API key (required for deep_research)
DEEPSEEK_BASE_URL https://api.deepseek.com OpenAI-compatible endpoint
DEEPSEEK_MODEL deepseek-v4-pro Synthesis / heavy model
DEEPSEEK_FAST_MODEL deepseek-v4-flash Cheap sub-step model
DEEPSEEK_THINKING auto auto/on/off — V4 thinking mode for hard steps
DEEPSEEK_REASONING_EFFORT high low/medium/high
VISION_BASE_URL OpenAI-compatible vision endpoint (any multimodal model)
VISION_API_KEY Key for the vision endpoint
VISION_MODEL e.g. gpt-4o, qwen-vl-max, glm-4v, a local qwen2.5-vl via Ollama
SEARCH_PROVIDER duckduckgo duckduckgo (keyless) / brave / tavily
TAVILY_API_KEY Required if SEARCH_PROVIDER=tavily
BRAVE_API_KEY Required if SEARCH_PROVIDER=brave (free 2k/mo, more reliable)
SEARCH_RETRY_ATTEMPTS 3 Retries with backoff when keyless DDG rate-limits
RESEARCH_BREADTH 3 Sub-queries generated per round
RESEARCH_DEPTH 2 Research rounds (rounds >1 trigger query refinement)
RESEARCH_MAX_SOURCES 8 Pages fetched, reranked & synthesised
RESEARCH_{PLAN,REFINE,RERANK,SYNTH}_MODEL "" Per-step model override (empty = flash for light steps, pro for synth)
LOG_LEVEL INFO Logging verbosity

deep_research pipeline (v0.2)

deep_research is a smart, multi-round pipeline (DeepSeek-V4 as the brain):

  1. Plan (V4-flash, JSON) → breadth search queries.
  2. Round loop (depth rounds): search → fetch → refine — V4-flash reads findings-so-far and generates next-round queries for uncovered facets.
  3. Rerank — V4-flash extracts the passages most relevant to the question from each page (always-on; quality over raw stuffing).
  4. Synthesize (V4-pro + thinking) → cited markdown report.

Per-step models are tunable; set all RESEARCH_*_MODEL to deepseek-v4-flash for the cheapest runs. dstools doctor prints a per-research cost estimate.

Vision providers (for analyze_image)

Since DeepSeek-V4 can't see images, point VISION_* at any OpenAI-compatible multimodal model:

  • OpenAI: VISION_BASE_URL=https://api.openai.com/v1, VISION_MODEL=gpt-4o / gpt-4o-mini
  • Alibaba Qwen-VL (DashScope, OpenAI-compat): VISION_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1, VISION_MODEL=qwen-vl-max
  • Zhipu GLM-4V: VISION_BASE_URL=https://open.bigmodel.cn/api/paas/v4, VISION_MODEL=glm-4v
  • Local (Ollama): VISION_BASE_URL=http://localhost:11434/v1, VISION_MODEL=qwen2.5-vl (no key needed)

Without a vision provider, analyze_image degrades to image metadata + OCR (if pytesseract is installed) and returns a clear note — it never crashes.

Development

uv sync --extra dev
make lint        # ruff
make typecheck   # mypy
make test        # pytest
make serve       # run the server (stdio)

Project layout

src/dstools/
  server.py          # FastMCP server + tool registration
  cli.py             # `dstools` CLI (serve / inspect / doctor)
  config.py          # pydantic-settings config
  llm/               # DeepSeek (OpenAI-compat) + vision clients, V4 thinking-aware
  search/            # pluggable search providers (DuckDuckGo default, Tavily optional)
  web/               # async page fetcher + HTML→Markdown extraction
  tools/             # image / search / fetch / research tools
  utils/             # image I/O & encoding, text chunking
tests/               # pytest suite (network & LLM mocked)
examples/            # claude_desktop_config.json, mcp client demo

License

MIT.

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured