MSPaint MCP Server with AI-based Planning Algorithms

MSPaint MCP Server with AI-based Planning Algorithms

Using Advanced AI Prompting to enhance LLM planning to solve complex math problems and draw the answer on MSPaint Canvas

shettysaish20

Research & Data
Visit Server

README

MSPaint MCP Server with AI-based Planning Algorithms

This project demonstrates how to use Advanced AI Prompting to make LLMs robust to handle complex math problems of multiple steps. It uses the Model Context Protocol (MCP) to allow an AI agent, powered by Google's Gemini model, to interact with a legacy Windows application (MSPaint). The AI agent leverages tools defined using fastmcp and implemented with pywinauto to solve math problems and then draw the solution on the Paint canvas.

Table of Contents

Introduction

This project showcases use of Advanced AI Prompting to make LLM robust to handle complex math problems of multiple steps.

Structured Prompting for This Problem Statement

You are a math agent with painting skills, solving complex math expressions step-by-step.
You have access to various mathematical tools for calculations and verifications, as well as an MSPaint application to draw and present your solution on a canvas.

Available Tools:
{tools_description}

MSPaint Application Information:
- Rectangle coordinates: x1 = 763, y1 = 595, x2 = 1788, y2 = 1123

You must respond with EXACTLY ONE LINE in one of these formats (no additional text):

1. For function calls:
FUNCTION_CALL: {{"name": function_name, "arguments": {{"param1": value1, "param2": value2}}}}

2. For final answers:
FINAL_ANSWER: <NUMBER>

3. For completing the task:
COMPLETE_RUN

Instructions:
- Start by calling the show_reasoning tool ONLY ONCE with a list of all step-by-step reasoning steps explaining how you will solve the problem. Once called, NEVER CALL IT AGAIN UNDER ANY CIRCUMSTANCES.
- When reasoning, tag each step with the reasoning type (e.g., [Arithmetic], [Logical Check]).
- Use all available math tools to solve the problem step-by-step.
- When a function returns multiple values, process all of them.
- Apply BODMAS rules: start with the innermost parentheses and work outward.
- Do not skip steps — perform all calculations sequentially.
- Respond only with one line at a time.
- Call only one tool per response.
- After calculating a number, verify it by calling:
FUNCTION_CALL: {{"name": "verify_calculation", "arguments": {{"expression": <MATH_EXPRESSION>, "expected": <NUMBER>}}}}
- If verify_calculation returns False, re-evaluate your previous steps.
- Once you reach a final answer, check for consistency of all steps and calculations by calling:
FUNCTION_CALL: {{"name": "verify_consistency", "arguments": {{"steps": [[<MATH_EXPRESSION1>, <ANSWER1>], [<MATH_EXPRESSION2>, <ANSWER2>], ...]}}}} 
- If verify_consistency returns False, re-evaluate your previous steps.
- Once verify_consistency return True, submit your final result as:
FINAL_ANSWER: <NUMBER>

Paint Instructions:
- To draw in Paint, follow this sequence strictly:
1. Call open_paint to start the Paint application.
2. Verify Paint is open using verify_paint_open.
3. If verify_paint_open returns False, retry opening Paint until it succeeds.
4. After Paint is open, draw a rectangle using draw_rectangle with correct parameters.
5. Add text using add_text_in_paint, inserting your FINAL_ANSWER: <NUMBER>.

Final Step:
- After completing all calculations, verifications, and drawings, call:
COMPLETE_RUN

Strictly follow the above guidelines.
Your entire response should always be a single line starting with either FUNCTION_CALL:, FINAL_ANSWER: or COMPLETE_RUN.

ChatGPT Structured Prompting Evaluation Result

{
  "explicit_reasoning": true,
  "structured_output": true,
  "tool_separation": true,
  "conversation_loop": true,
  "instructional_framing": true,
  "internal_self_checks": true,
  "reasoning_type_awareness": true,
  "fallbacks": true,
  "overall_clarity": "Extremely strong prompt — it carefully enforces step-by-step reasoning, structured outputs, error handling, and tool use separation. Very minor improvements could be to give a short worked-out example, but even without it, the robustness is excellent."
}

Project Structure

├── MSPaint-MCP-Server/
│ ├── mcp_server.py # Defines the MCP server with tools for Paint automation 
│ ├── mcp_client.py # Defines the MCP client that interacts with the server and AI model 
│ ├── requirements.txt # Lists the project dependencies 
│ └── .env # Stores the Gemini API key 
├── README.md # This file

Requirements

  • Python 3.11+
  • Conda (recommended for environment management)
  • Google Gemini API key
  • pywin32
  • pywinauto
  • fastmcp
  • python-dotenv
  • google-genai
  • rich

Setup

  1. Create a Conda environment:

    conda create -n eagenv python=3.11
    conda activate eagenv
    
  2. Install dependencies:

    pip install -r requirements.txt
    
  3. Set up the Gemini API key:

    • Create a .env file in the directory.

    • Add your Gemini API key to the .env file:

      GEMINI_API_KEY=YOUR_API_KEY
      

Usage

  1. Run the MCP client:

    python mcp_paint_app/mcp_client.py
    

    This will start the MCP client, which connects to the MCP server, initializes the AI agent, and begins the automation process.

How It Works

  1. MCP Server (mcp_server.py):

    • Defines the tools for interacting with MSPaint (e.g., open_paint, draw_rectangle, add_text_in_paint) and various mathematical operations (e.g., add, subtract, multiply, divide, verify_calculation, verify_consistency).
    • Uses pywinauto to control the MSPaint application.
    • Exposes these tools via the fastmcp library.
  2. MCP Client (mcp_client.py):

    • Connects to the MCP server.
    • Uses the Google Gemini model to generate instructions and solve the given math expression.
    • Parses the model's output to determine which tool to call.
    • Calls the appropriate tool on the MCP server with the required parameters.
    • Handles the response from the tool and feeds it back to the model for the next step.
    • Orchestrates the drawing of the final answer in MSPaint.
  3. AI Agent (Google Gemini):

    • Receives a complex math expression (e.g., ((3000 - (400+552)) / 2 + 1024).
    • Uses the available tools (defined in the system prompt) to solve the problem step by step.
    • Generates function calls (e.g., FUNCTION_CALL: {"name": "add", "arguments" {"a": 400, "b": 552}}) to use the tools.
    • Verifies each calculation using the verify_calculation tool.
    • Ensures the consistency of all steps using the verify_consistency tool.
    • Once the final answer is obtained and verified, it uses Paint to display the result by opening Paint, drawing a rectangle, and adding the final answer as text.
    • Completes the run by calling the COMPLETE_RUN command.

Key Components

  • mcp_server.py: Contains the core logic for automating MSPaint. The open_paint, draw_rectangle, and add_text_in_paint functions are the key tools used by the AI agent.
  • mcp_client.py: Manages the interaction between the AI agent and the MCP server. It sets up the system prompt, calls the tools, and handles the responses.
  • requirements.txt: Lists all the necessary Python packages for the project.
  • .env: Stores the Google Gemini API key.

Troubleshooting

  • Permission Issues: If you encounter permission issues, try running the scripts as an administrator.
  • Coordinate Issues: The coordinates used for clicking in MSPaint may need to be adjusted based on your screen resolution and window size. Use the debugging print statements in the code to identify the correct coordinates.
  • Tool Selection Issues: If the AI agent is not selecting the correct tools, review the system prompt and ensure that the tool descriptions are accurate.
  • API Key Issues: Ensure that your Gemini API key is correctly set in the .env file.

Contributing

Contributions are welcome! Please submit a pull request with your changes.

License

MIT License

Recommended Servers

Crypto Price & Market Analysis MCP Server

Crypto Price & Market Analysis MCP Server

A Model Context Protocol (MCP) server that provides comprehensive cryptocurrency analysis using the CoinCap API. This server offers real-time price data, market analysis, and historical trends through an easy-to-use interface.

Featured
TypeScript
MCP PubMed Search

MCP PubMed Search

Server to search PubMed (PubMed is a free, online database that allows users to search for biomedical and life sciences literature). I have created on a day MCP came out but was on vacation, I saw someone post similar server in your DB, but figured to post mine.

Featured
Python
dbt Semantic Layer MCP Server

dbt Semantic Layer MCP Server

A server that enables querying the dbt Semantic Layer through natural language conversations with Claude Desktop and other AI assistants, allowing users to discover metrics, create queries, analyze data, and visualize results.

Featured
TypeScript
mixpanel

mixpanel

Connect to your Mixpanel data. Query events, retention, and funnel data from Mixpanel analytics.

Featured
TypeScript
Sequential Thinking MCP Server

Sequential Thinking MCP Server

This server facilitates structured problem-solving by breaking down complex issues into sequential steps, supporting revisions, and enabling multiple solution paths through full MCP integration.

Featured
Python
Nefino MCP Server

Nefino MCP Server

Provides large language models with access to news and information about renewable energy projects in Germany, allowing filtering by location, topic (solar, wind, hydrogen), and date range.

Official
Python
Vectorize

Vectorize

Vectorize MCP server for advanced retrieval, Private Deep Research, Anything-to-Markdown file extraction and text chunking.

Official
JavaScript
Mathematica Documentation MCP server

Mathematica Documentation MCP server

A server that provides access to Mathematica documentation through FastMCP, enabling users to retrieve function documentation and list package symbols from Wolfram Mathematica.

Local
Python
kb-mcp-server

kb-mcp-server

An MCP server aimed to be portable, local, easy and convenient to support semantic/graph based retrieval of txtai "all in one" embeddings database. Any txtai embeddings db in tar.gz form can be loaded

Local
Python
Research MCP Server

Research MCP Server

The server functions as an MCP server to interact with Notion for retrieving and creating survey data, integrating with the Claude Desktop Client for conducting and reviewing surveys.

Local
Python