MCP Servers

Unstructured API MCP Server for Research Paper Data Processing

GitHub repository for Unstructured MCP Hackathon.

HeetVekariya

Research & Data

README

Unstructured API MCP Server for Research Paper Data Processing

By leveraging the Unstructured API, this server facilitates easy access to a set of powerful tools that extract meaningful information from research papers, which can then be used for fine-tuning a language model (LLM) to reduce the literature review time for researchers.

Check out the Blog here:

For a detailed explanation of the hackathon, the project, and to follow along, check out my blog post on Dev.to: Unstructured Model Context Protocol Hackathon.

Setup

Install dependencies:

uv add "mcp[cli]"
uv pip install --upgrade unstructured-client python-dotenv

or use uv sync.

Requirements

Before you can begin working with the UNS_MCP project, make sure you have the following setup:

UNSTRUCTURED_API_KEY
- Get your API key from the Unstructured platform to access their API for document processing.
GOOGLEDRIVE_SERVICE_ACCOUNT_KEY
- Set up a Google Cloud project and create a service account to enable access to Google Drive for reading PDFs. Check the set up process here.
- Save the JSON credentials for your service account and use it to set up the GOOGLEDRIVE_SERVICE_ACCOUNT_KEY.
MONGO_DB_CONNECTION_STRING
- Set up a MongoDB database (cloud) and get the connection string for connecting to the database. Check out set up process here.
.env.template
- The .env.template file includes all the required environment variables. Copy this file to .env and set the necessary values for the keys mentioned above.
Example .env file:
```
UNSTRUCTURED_API_KEY="<key-here>"
MONGO_DB_CONNECTION_STRING="<CONNECTION_STRING>"
GOOGLEDRIVE_SERVICE_ACCOUNT_KEY="<converted string>"
```

Project Flow

User Query to MCP Client
Claude Interacts with UNS_MCP Server
- Claude forwards the user's query to the custom MCP server named UNS_MCP.
MCP Tool Executes Unstructured API
- UNS_MCP interacts with the Unstructured API to process the research paper PDF, extract relevant information, and convert it into structured JSON data.
Structured Data (JSON) Output is stored in the destination source
- The result from the Unstructured API is transformed into JSON format, which can then be further utilized to fine-tune LLMs, helping researchers quickly find the relevant information without manually reading the entire paper.

Available Tools

Tool	Description
`list_sources`	Lists available sources from the Unstructured API.
`get_source_info`	Get detailed information about a specific source connector.
`create_gdrive_source`	Create a google drive source connector.
`update_gdrive_source`	Update an existing google source connector by params.
`delete_gdrive_source`	Delete a source connector by source id.
`list_destinations`	Lists available destinations from the Unstructured API.
`get_destination_info`	Get detailed info about a specific destination connector. Currently, we have s3/weaviate/astra/neo4j/mongo DB (more to come!)
`create_mongodb_destination`	Create a mongodb destination connector by params.
`update_mongodb_destination`	Update an existing mongodb destination connector by destination id.
`delete_mongodb_destination`	Delete a mongodb destination connector by destination id.
`list_workflows`	Lists workflows from the Unstructured API.
`get_workflow_info`	Get detailed information about a specific workflow.
`create_workflow`	Create a new workflow with source, destination id, etc.
`run_workflow`	Run a specific workflow with workflow id
`update_workflow`	Update an existing workflow by params.
`delete_workflow`	Delete a specific workflow by id.
`list_jobs`	Lists jobs for a specific workflow from the Unstructured API.
`get_job_info`	Get detailed information about a specific job by job id.
`cancel_job`	Delete a specific job by id.

Follow Along

1. Set Up Required Connectors

Google Drive Source Connector:

Create a Google Drive Source Connector to connect your service account with Google Drive and retrieve PDFs.
Test the connection to ensure accessibility.

MongoDB Destination Connector:

Set up the MongoDB Destination Connector to store processed data.
Test the connection to ensure accessibility.

2. Develop the Workflow

Define Connectors: Set up the Google Drive source and MongoDB destination connectors.
Partitioning: Use Auto partitioning for optimal document splitting.
Chunking: Apply by-page chunking for manageable text segments.
Enrichment: Use NER to extract entities and table enrichment for any tables.
Embedding: Convert text into embeddings for querying or analysis.

Note: Tweak the Flow: Adjust any step (partitioning, chunking, enrichment, embedding) as needed.

3. Set Up Claude Desktop

Install Claude Desktop and integrate it with the UNS_MCP server by following steps given below.
Restart Claude to link with the MCP server and ensure workflow functionality.

4. Query and Run the Workflow

Use Claude to interact with the system and execute queries to list, create, edit, delete and run the workflow. You can perform many such tasks, go through Available Tools given above.

5. Results

Claude Desktop Integration

To install in Claude Desktop:

Go to claude_desktop_config.json by running the below command.

# For macOS or Linux:
code ~/Library/Application\ Support/Claude/claude_desktop_config.json

# For Windows:
code $env:AppData\Claude\claude_desktop_config.json

In that file add:

{
    "mcpServers":
    {
        "UNS_MCP":
        {
            "command": "ABSOLUTE/PATH/TO/.local/bin/uv",
            "args":
            [
                "--directory",
                "ABSOLUTE/PATH/TO/YOUR-UNS-MCP-REPO/uns_mcp",
                "run",
                "server.py"
            ],
            "env":
            [
            "UNSTRUCTURED_API_KEY":"<your key>"
            ],
            "disabled": false
        }
    }
}

Restart Claude Desktop.
Example Issues seen from Claude Desktop.
- You will see No destinations found when you query for a list of destination connectors. Check your API key in .env or in your config json, it needs to be your personal key in https://platform.unstructured.io/app/account/api-keys.

Debugging tools

Anthropic provides MCP Inspector tool to debug/test your MCP server. Run the following command to spin up a debugging UI. From there, you will be able to add environment variables (pointing to your local env) on the left pane. Include your personal API key there as env var. Go to tools, you can test out the capabilities you add to the MCP server.

mcp dev uns_mcp/server.py

If you need to log request call parameters to UnstructuredClient, set the environment variable DEBUG_API_REQUESTS=false. The logs are stored in a file with the format unstructured-client-{date}.log, which can be examined to debug request call parameters to UnstructuredClient functions.

Running locally minimal client, accessing local the MCP server over HTTP + SSE

The main difference here is it becomes easier to set breakpoints on the server side during development -- the client and server are decoupled.

# in one terminal, run the server:
uv run python uns_mcp/server.py --host 127.0.0.1 --port 8080

or
make sse-server

# in another terminal, run the client:
uv run python minimal_client/client.py "http://127.0.0.1:8080/sse"
or
make sse-client

Hint: ctrl+c out of the client first, then the server. Otherwise the server appears to hang.

Recommended Servers

mixpanel

Connect to your Mixpanel data. Query events, retention, and funnel data from Mixpanel analytics.

Featured

TypeScript

Sequential Thinking MCP Server

This server facilitates structured problem-solving by breaking down complex issues into sequential steps, supporting revisions, and enabling multiple solution paths through full MCP integration.

Featured

Python

Crypto Price & Market Analysis MCP Server

A Model Context Protocol (MCP) server that provides comprehensive cryptocurrency analysis using the CoinCap API. This server offers real-time price data, market analysis, and historical trends through an easy-to-use interface.

Featured

TypeScript

MCP PubMed Search

Server to search PubMed (PubMed is a free, online database that allows users to search for biomedical and life sciences literature). I have created on a day MCP came out but was on vacation, I saw someone post similar server in your DB, but figured to post mine.

Featured

Python

dbt Semantic Layer MCP Server

A server that enables querying the dbt Semantic Layer through natural language conversations with Claude Desktop and other AI assistants, allowing users to discover metrics, create queries, analyze data, and visualize results.

Featured

TypeScript

Nefino MCP Server

Provides large language models with access to news and information about renewable energy projects in Germany, allowing filtering by location, topic (solar, wind, hydrogen), and date range.

Official

Python

Vectorize

Vectorize MCP server for advanced retrieval, Private Deep Research, Anything-to-Markdown file extraction and text chunking.

Official

JavaScript

MATLAB MCP Server

Integrates MATLAB with AI to execute code, generate scripts from natural language, and access MATLAB documentation seamlessly.

Local

JavaScript

Macrostrat MCP Server

Enables Claude to query comprehensive geologic data from the Macrostrat API, including geologic units, columns, minerals, and timescales through natural language.

Local

JavaScript

MCP Word Counter

A Model Context Protocol server that provides tools for analyzing text documents, including counting words and characters. This server helps LLMs perform text analysis tasks by exposing simple document statistics functionality.

Local

JavaScript

Unstructured API MCP Server for Research Paper Data Processing

README

Unstructured API MCP Server for Research Paper Data Processing

Check out the Blog here:

Table of Contents:

Setup

Requirements

Project Flow

Available Tools

Follow Along

1. Set Up Required Connectors

Google Drive Source Connector:

MongoDB Destination Connector:

2. Develop the Workflow

3. Set Up Claude Desktop

4. Query and Run the Workflow

5. Results

Claude Desktop Integration

Debugging tools

Running locally minimal client, accessing local the MCP server over HTTP + SSE

Recommended Servers