OScribe

OScribe

Vision-based desktop automation MCP server that controls any application via screenshot and AI vision, enabling UI automation through natural language commands.

Category
Visit Server

README

OScribe

Vision-based desktop automation MCP server. Control any application via screenshot + AI vision.

npm version License: BSL 1.1 Node TypeScript Windows

Supported Platforms & Applications

<div align="center">

Operating Systems

<table> <tr> <td align="center" width="150"> <img src="img/macos-logo.png" width="48" height="48" alt="macOS"/><br/> <b>macOS</b> </td> <td align="center" width="150"> <img src="img/windows-logo.png" width="48" height="48" alt="Windows"/><br/> <b>Windows</b> </td> </tr> </table>

Native Applications

<table> <tr> <td align="center" width="150"> <img src="img/finder-icon.png" width="48" height="48" alt="Finder"/><br/> <b>Finder</b><br/> <sub>File management</sub> </td> <td align="center" width="150"> <img src="img/windows-folder-icon.png" width="48" height="48" alt="Windows Explorer"/><br/> <b>Explorer</b><br/> <sub>File operations</sub> </td> <td align="center" width="150"> <img src="img/macos-settings-icon.png" width="48" height="48" alt="Settings"/><br/> <b>System Settings</b><br/> <sub>macOS & Windows</sub> </td> </tr> </table>

Web Browsers (CDP-enhanced)

<table> <tr> <td align="center" width="150"> <img src="img/chrome-logo.png" width="48" height="48" alt="Chrome"/><br/> <b>Chrome</b><br/> <sub>200-300+ elements</sub> </td> <td align="center" width="150"> <img src="img/brave-logo.png" width="48" height="48" alt="Brave"/><br/> <b>Brave</b><br/> <sub>Full CDP support</sub> </td> <td align="center" width="150"> <b>Edge, Arc, Opera</b><br/> <sub>Chromium-based</sub> </td> </tr> </table>

Note: Chrome 136+ requires automatic profile sync (~20-30s) due to CDP security changes.

</div>

Table of Contents

Why OScribe?

"If you can see it, OScribe can click it."

OScribe is your fallback when traditional automation tools fail:

  • Legacy apps without APIs
  • Games and canvas apps without DOM
  • Third-party software you can't modify
  • Ad-hoc automation without infrastructure setup

Demo

Helltaker - Full Chapter 1 Automated

<div align="center"> <img src="demo/helltaker-chapter1.gif" alt="OScribe automating Helltaker Chapter 1" width="800"/> </div>

Claude plays through the entire first chapter of Helltaker using OScribe MCP tools - navigating menus, solving puzzles, and progressing through dialogue, all via screenshot + vision.

Features

  • 🎯 Vision-based - Locate UI elements by description using Claude vision
  • πŸ” UI Automation - Get element coordinates via Windows accessibility tree
  • πŸ”§ MCP Server - Integrates with Claude Desktop, Claude Code, Cursor, Windsurf
  • ⚑ Native Input - Uses robotjs for reliable mouse/keyboard control
  • πŸ“Έ Multi-monitor - Supports multiple screens with DPI awareness
  • πŸͺŸ Windows - Currently tested on Windows only
  • βš›οΈ Electron Support - Full UI element detection in Electron apps (via NVDA)

Quick Start

Guided Installation (Recommended)

Run our interactive installer that checks and installs all prerequisites for you:

# macOS/Linux
curl -fsSL https://raw.githubusercontent.com/mikealkeal/oscribe/main/scripts/install.mjs | node

# Windows (PowerShell as Administrator)
irm https://raw.githubusercontent.com/mikealkeal/oscribe/main/scripts/install.mjs -OutFile install.mjs; node install.mjs

The installer will:

  1. βœ… Check Node.js version (22+ required)
  2. βœ… Check/install Python
  3. βœ… Check/install build tools (VS Build Tools or Xcode CLI)
  4. βœ… Install OScribe

Manual Installation

If you prefer manual installation or already have prerequisites:

npm install -g oscribe

Then configure your MCP client (see MCP Integration below).

Installation

System Prerequisites

OScribe uses robotjs for native mouse/keyboard control, which requires compilation tools:

Windows

  1. Node.js 22+ - Download

  2. Python 3.x - Download (check "Add to PATH" during install)

  3. Visual Studio Build Tools - Install with C++ workload:

    # Option 1: Via npm (recommended)
    npm install -g windows-build-tools
    
    # Option 2: Manual install
    # Download from https://visualstudio.microsoft.com/visual-cpp-build-tools/
    # Select "Desktop development with C++" workload
    

macOS

  1. Node.js 22+ - Download or brew install node

  2. Xcode Command Line Tools:

    xcode-select --install
    
  3. Python 3.x - Usually pre-installed, verify with python3 --version

Verify Prerequisites

Before installing, run the diagnostic script to check all prerequisites:

# macOS/Linux - Run directly without installation
curl -fsSL https://raw.githubusercontent.com/mikealkeal/oscribe/main/scripts/doctor.mjs | node

# Windows (PowerShell)
irm https://raw.githubusercontent.com/mikealkeal/oscribe/main/scripts/doctor.mjs -OutFile doctor.mjs; node doctor.mjs

The doctor script checks:

  • Node.js version (22+)
  • Python installation
  • Build tools (VS Build Tools on Windows, Xcode CLI on macOS)

It provides step-by-step fix instructions for any missing prerequisites.

After OScribe is installed, you can also run:

oscribe doctor

Additional Requirements

  • Claude Desktop, Claude Code, or any MCP client (provides OAuth authentication)

From npm (Recommended)

# Global installation
npm install -g oscribe

# Verify installation
oscribe --version

From Source

git clone https://github.com/mikealkeal/oscribe.git
cd oscribe
npm install
npm run build
npm link  # Makes 'oscribe' command available globally

Platform Support

Platform Status
Windows βœ… Fully supported
macOS βœ… Supported
Linux 🚧 Not tested yet

Windows Details

  • PowerShell (included)
  • UI Automation via PowerShell + .NET
  • NVDA support for Electron apps

macOS Details

  • Native screencapture command
  • UI Automation via AXUIElement API (ax-reader binary)
  • Requires: Accessibility permissions (System Settings β†’ Privacy & Security β†’ Accessibility)
    • Add Terminal or your IDE to allowed apps
    • IMPORTANT for VSCode users: You must also authorize VSCode in "App Management" (Login Items & Extensions)
      1. Open System Settings β†’ General β†’ Login Items & Extensions
      2. Find "Visual Studio Code"
      3. Toggle ON the switch
      4. Enter your password or use Touch ID to confirm
      5. This is required for OScribe MCP to control your system from Claude Code
  • Native apps (Chrome, Safari, Finder) work well
  • Electron apps (VS Code, etc.) have limited element detection (same as Windows without NVDA)

Usage

CLI Commands

Vision-Based Clicking (The Core of OScribe!)

oscribe click "Submit button"              # Click by description - the magic!
oscribe click "File menu"                  # Works on any visible element
oscribe click "Export as PNG" --screen 1   # Target specific monitor
oscribe click "Close" --dry-run            # Preview without clicking

Input & Automation

oscribe type "hello world"                 # Type text
oscribe hotkey "ctrl+c"                    # Press keyboard shortcut
oscribe hotkey "ctrl+shift+esc"            # Multiple modifiers

Screenshots

oscribe screenshot                      # Capture primary screen
oscribe screenshot -o capture.png       # Save to file
oscribe screenshot --screen 1           # Capture second monitor
oscribe screenshot --list               # List available screens
oscribe screenshot --describe           # Describe screen content with AI

Window Management

oscribe windows                         # List open windows
oscribe focus "Chrome"                  # Focus window by name
oscribe focus "Calculator"              # Works with partial matches

MCP Server

oscribe serve                          # Start MCP server (stdio transport)

Global Options

--verbose, -v          # Detailed output
--dry-run              # Simulate without executing
--quiet, -q            # Minimal output
--screen N             # Target specific screen (default: 0)

Examples

# Take screenshot and save
oscribe screenshot -o desktop.png

# Type with delay between keystrokes
oscribe type "slow typing" --delay 100

# Use second monitor
oscribe screenshot --screen 1 --describe

# Dry run to see what would happen
oscribe type "test" --dry-run

MCP Integration

OScribe exposes tools via Model Context Protocol for AI agents. Works with Claude Desktop, Claude Code, Cursor, Windsurf, and any MCP-compatible client.

Quick Setup

Claude Desktop

Edit your config file:

OS Config Path
Windows %APPDATA%\Claude\claude_desktop_config.json
macOS ~/Library/Application Support/Claude/claude_desktop_config.json

Add OScribe to mcpServers:

{
  "mcpServers": {
    "oscribe": {
      "command": "npx",
      "args": ["-y", "oscribe", "serve"]
    }
  }
}

Or if installed globally (npm install -g oscribe):

{
  "mcpServers": {
    "oscribe": {
      "command": "oscribe",
      "args": ["serve"]
    }
  }
}

Then restart Claude Desktop. You'll see a πŸ”Œ icon indicating MCP tools are available.

Claude Code / Cursor / Windsurf

Add a .mcp.json file in your project root:

{
  "mcpServers": {
    "oscribe": {
      "command": "npx",
      "args": ["-y", "oscribe", "serve"]
    }
  }
}

Or if installed globally:

{
  "mcpServers": {
    "oscribe": {
      "command": "oscribe",
      "args": ["serve"]
    }
  }
}

Available MCP Tools

Tool Description Parameters
os_screenshot πŸ“Έ Capture screenshot + cursor position screen? (default: 0)
os_inspect πŸ” Get UI elements via Windows UI Automation window?
os_inspect_at 🎯 Get element info at coordinates x, y
os_move Move mouse cursor x, y
os_click Click at current cursor position window?, button?
os_click_at Move + click in one action x, y, window?, button?
os_type Type text text
os_hotkey Press keyboard shortcut keys (e.g., "ctrl+c")
os_scroll Scroll in direction direction, amount?
os_windows List open windows + screens -
os_focus Focus window by name window
os_wait Wait for duration (UI loading) ms (max 30000)
os_nvda_status Check NVDA screen reader status (Electron support) -
os_nvda_install Download NVDA portable for Electron apps -
os_nvda_start Start NVDA in silent mode -
os_nvda_stop Stop NVDA screen reader -

MCP Usage Example

Once configured, Claude can automate your desktop:

"Take a screenshot and describe what you see"

"Inspect the UI elements and click the Submit button"

"List all windows and focus on Chrome"

"Type 'hello world' and press Ctrl+Enter"

Workflow: Claude uses os_screenshot to see the screen, os_inspect to get element coordinates, then os_move + os_click for precise interaction.

Configuration

Config directory: ~/.oscribe/

Files

  • config.json - Application settings

config.json

{
  "defaultScreen": 0,
  "dryRun": false,
  "logLevel": "info",
  "cursorSize": 128
}

Configuration Options

Option Type Default Description
defaultScreen number 0 Default monitor to capture
dryRun boolean false Simulate actions without executing
logLevel string "info" Log level: debug, info, warn, error
cursorSize number 128 Cursor size in screenshots (32-256)
nvda.autoDownload boolean false Auto-download NVDA when needed
nvda.autoStart boolean true Auto-start NVDA for Electron apps
nvda.customPath string - Custom NVDA installation path

How It Works

OScribe uses a multi-layer approach for desktop automation (Windows):

  1. Screenshot Layer - Captures screen using PowerShell + .NET System.Drawing

  2. UI Automation Layer - Gets element coordinates via Windows accessibility tree:

    • Uses Windows UI Automation API via PowerShell
    • Returns interactive elements with screen coordinates
    • Works like a DOM for desktop apps
  3. Input Layer - Uses robotjs for:

    • Mouse movement and clicks
    • Keyboard input and hotkeys
    • Adapts to Windows mouse button swap settings

Best strategy: Use os_screenshot which returns UI elements with coordinates, then os_move + os_click for precise interaction.

Development

Setup

git clone https://github.com/mikealkeal/oscribe.git
cd oscribe
npm install

Scripts

npm run build       # Build TypeScript
npm run dev         # Development mode (watch)
npm run typecheck   # Type check only
npm run lint        # Run ESLint
npm run lint:fix    # Fix linting issues
npm run format      # Format with Prettier
npm run clean       # Remove dist folder

Project Structure

oscribe/
β”œβ”€β”€ bin/
β”‚   └── oscribe.ts              # CLI entry point
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ core/
β”‚   β”‚   β”œβ”€β”€ screenshot.ts     # Multi-platform screen capture
β”‚   β”‚   β”œβ”€β”€ input.ts          # Mouse/keyboard control (robotjs)
β”‚   β”‚   β”œβ”€β”€ windows.ts        # Window management
β”‚   β”‚   └── uiautomation.ts   # Windows UI Automation (accessibility)
β”‚   β”œβ”€β”€ cli/
β”‚   β”‚   β”œβ”€β”€ commands/         # CLI command implementations
β”‚   β”‚   └── index.ts          # Command registration
β”‚   β”œβ”€β”€ mcp/
β”‚   β”‚   └── server.ts         # MCP server (12 tools)
β”‚   β”œβ”€β”€ config/
β”‚   β”‚   └── index.ts          # Config management with Zod
β”‚   └── index.ts              # Main exports
β”œβ”€β”€ package.json
β”œβ”€β”€ tsconfig.json
β”œβ”€β”€ .env.example
└── LICENSE

Tech Stack

  • Runtime: Node.js 22+ (ESM)
  • Language: TypeScript 5.7+ (strict mode)
  • Validation: Zod
  • CLI: Commander + Chalk + Ora
  • Vision: Anthropic SDK (Claude Sonnet 4)
  • Input: robotjs (native automation)
  • Screenshot: screenshot-desktop + platform-specific tools
  • MCP: @modelcontextprotocol/sdk

Troubleshooting

Installation Issues

npm install fails with node-gyp errors:

First, run the diagnostic script (no installation required):

# macOS/Linux
curl -fsSL https://raw.githubusercontent.com/mikealkeal/oscribe/main/scripts/doctor.mjs | node

# Windows (PowerShell)
irm https://raw.githubusercontent.com/mikealkeal/oscribe/main/scripts/doctor.mjs -OutFile doctor.mjs; node doctor.mjs

This is usually due to missing build tools. robotjs requires native compilation.

# Error examples:
# - "gyp ERR! find Python"
# - "gyp ERR! find VS"
# - "node-pre-gyp ERR! build error"

Windows fix:

# 1. Install Python (if missing)
# Download from https://www.python.org/downloads/
# IMPORTANT: Check "Add Python to PATH" during installation

# 2. Install Visual Studio Build Tools
npm install -g windows-build-tools

# Or manually: download from https://visualstudio.microsoft.com/visual-cpp-build-tools/
# Select "Desktop development with C++" workload

# 3. Retry installation
npm install -g oscribe

macOS fix:

# 1. Install Xcode Command Line Tools
xcode-select --install

# 2. Retry installation
npm install -g oscribe

Still failing? Try clearing npm cache:

npm cache clean --force
npm install -g oscribe

MCP Server Issues

Server not starting:

  • Check Node.js version: node --version (requires 22+)
  • Rebuild if needed: npm run build
  • Check path in your MCP config file

Tools not appearing in Claude Desktop:

  • Restart Claude Desktop after config changes
  • Check claude_desktop_config.json syntax (valid JSON)
  • Look for πŸ”Œ icon in Claude Desktop interface

Windows Issues

Clicks not working:

  • OScribe auto-detects swapped mouse buttons
  • No manual configuration needed

UI elements not detected:

  • Some apps don't expose UI Automation elements
  • Use os_screenshot to see what's visible
  • Coordinates are returned in the screenshot response

Electron apps showing few UI elements:

Electron/Chromium apps require NVDA screen reader to expose their full accessibility tree:

# Install NVDA portable (one-time)
oscribe nvda install

# Start NVDA silently (no audio)
oscribe nvda start

Or via MCP tools: os_nvda_install β†’ os_nvda_start

NVDA runs in silent mode (no speech, no sounds). The agent will prompt to install NVDA when needed.

Manual NVDA installation:

If you prefer to install NVDA yourself, download from nvaccess.org and set the path in config:

{
  "nvda": {
    "customPath": "C:/Program Files/NVDA"
  }
}

License

BSL 1.1 (Business Source License 1.1)

  • βœ… Free for personal use
  • βœ… Free for open-source projects
  • ⚠️ Commercial use requires a paid license (until 2029)
  • πŸ”„ Converts to MIT on 2029-01-30 (then free for everyone)

See LICENSE for full terms.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Guidelines

  1. Follow the existing code style (ESLint + Prettier configured)
  2. Add tests for new features
  3. Update documentation as needed
  4. Ensure npm run build succeeds
  5. Check types with npm run typecheck

Areas for Contribution

  • [ ] Additional platform support (BSD, other Unix variants)
  • [ ] More sophisticated element location strategies
  • [ ] Performance optimizations
  • [ ] Additional MCP tools
  • [ ] Better error messages
  • [ ] Documentation improvements

Support

Roadmap

  • [x] npm package distribution
  • [ ] Web interface for remote control
  • [ ] Recording and playback of automation sequences
  • [ ] Multi-provider vision support (GPT-4V, Gemini)
  • [ ] Plugin system for custom tools
  • [ ] Docker container distribution

Acknowledgements

OScribe is built on top of these great open-source projects:


Maintained by MickaΓ«l Bellun

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured