MCP Servers

barracuda-mcp

Enables searching OpenAleph entities and documents using natural language queries through the MCP protocol.

README

barracuda-mcp

A Model Context Protocol server that exposes OpenAleph entity and document search via the official HTTP API (GET /api/2/search). It is intended for use from LM Studio (recommended), Cursor, Claude Desktop, and other MCP clients over stdio.

After clicking Add to LM Studio, update ALEPH_BASE_URL and ALEPH_API_KEY in LM Studio's MCP settings.

Requirements

Node.js 20+
An OpenAleph instance with a valid API key (see OpenAleph Python guide for OPAL_HOST / OPAL_API_KEY conventions)

Install

As an npm package (recommended)

Run the latest published version directly with npx (no install needed):

npx -y barracuda-mcp

Or install globally:

npm install -g barracuda-mcp
barracuda-mcp

Or add it as a dependency to another project:

npm install barracuda-mcp

The package ships with a barracuda-mcp executable (defined in the bin field) that speaks MCP over stdio.

From source

git clone https://github.com/Sourcery-info/barracuda-mcp.git
cd barracuda-mcp
npm ci
npm run build

The built entrypoint is dist/index.js.

Configuration

The server reads configuration from environment variables. This package does not load a .env file automatically — export the variables in your shell, process manager, or MCP client's env block.

Variable	Description
`ALEPH_BASE_URL`	Origin of your OpenAleph instance (e.g. `https://aleph.example.org`). Preferred if set.
`OPAL_HOST`	Same role as `ALEPH_BASE_URL` if the latter is unset. Path components are ignored; only the origin is used.
`ALEPH_API_KEY`	API key. Preferred if set.
`OPAL_API_KEY`	Used if `ALEPH_API_KEY` is unset.
`ALEPH_REQUEST_TIMEOUT_MS`	Optional. Request timeout in milliseconds (default `60000`, clamped between `1000` and `600000`).
`ALEPH_SESSION_ID`	Optional. Sent as `X-Aleph-Session`; defaults to a random UUID per process.

Precedence: ALEPH_BASE_URL over OPAL_HOST; ALEPH_API_KEY over OPAL_API_KEY.

Authentication: Requests use Authorization: ApiKey <your_key>, consistent with the openaleph-client library.

Run locally

From an npm install:

export ALEPH_BASE_URL=https://your-instance.example.org
export ALEPH_API_KEY=your_key
npx -y barracuda-mcp

From source:

export ALEPH_BASE_URL=https://your-instance.example.org
export ALEPH_API_KEY=your_key
npm start

The server speaks MCP over stdio (stdin/stdout). Do not run it in a terminal you expect to use interactively for other output.

MCP client setup

LM Studio (recommended)

The easiest way to get started: click the Add to LM Studio button at the top of this README for one-click install.

After clicking, open LM Studio → Program → MCP and replace the placeholder ALEPH_BASE_URL and ALEPH_API_KEY values with your own.

To add it manually instead:

{
  "mcpServers": {
    "barracuda-mcp": {
      "command": "npx",
      "args": ["-y", "barracuda-mcp"],
      "env": {
        "ALEPH_BASE_URL": "https://your-instance.example.org",
        "ALEPH_API_KEY": "your_api_key_here"
      }
    }
  }
}

See the LM Studio MCP docs for more.

Cursor

Add a server entry in Cursor Settings → MCP:

{
  "mcpServers": {
    "openaleph": {
      "command": "npx",
      "args": ["-y", "barracuda-mcp"],
      "env": {
        "ALEPH_BASE_URL": "https://your-instance.example.org",
        "ALEPH_API_KEY": "your_api_key_here"
      }
    }
  }
}

Cursor from a local source build

If you've cloned and built from source, point Cursor at the built JS:

{
  "mcpServers": {
    "openaleph": {
      "command": "node",
      "args": ["/absolute/path/to/barracuda-mcp/dist/index.js"],
      "env": {
        "ALEPH_BASE_URL": "https://your-instance.example.org",
        "ALEPH_API_KEY": "your_api_key_here"
      }
    }
  }
}

For development without running npm run build, you can use tsx:

{
  "mcpServers": {
    "openaleph-dev": {
      "command": "npx",
      "args": ["tsx", "/absolute/path/to/barracuda-mcp/src/index.ts"],
      "env": {
        "ALEPH_BASE_URL": "https://your-instance.example.org",
        "ALEPH_API_KEY": "your_api_key_here"
      }
    }
  }
}

Claude Desktop / other stdio MCP clients

Any MCP client that supports stdio works — use the same command / args / env shape as above.

Tool: `aleph_search`

Searches OpenAleph using the same query model as the REST API (Elasticsearch-style q, pagination, facets, and filter:* parameters). The tool does not add default schemata filters unless you pass schema or schemata.

Argument	Type	Description
`q`	string (required)	Search query (e.g. plain text or `field:value`).
`limit`	number (optional)	Page size, max 10000 (server-enforced cap).
`offset`	number (optional)	Results to skip.
`collectionId`	string (optional)	Sets `filter:collection_id`.
`facets`	string[] (optional)	Repeated `facet` query params (e.g. `languages`, `countries`).
`extraFilters`	object (optional)	Map of filter name → value; each becomes `filter:{name}`.
`schema`	string \| string[] (optional)	Exact-schema filter. A single value becomes `filter:schema=X`. An array (`["Email","Pages"]`) or comma/space-separated string (`"Email,Pages"`) is OR-merged into `q` as `(schema:A OR schema:B)` because `filter:schema` only accepts one value.
`schemata`	string \| string[] (optional)	Schema-with-descendants filter. Single / array / comma-separated are all accepted. Appends `schemata:X` (or `(schemata:A OR schemata:B …)`) to `q` — never sent as `filter:schemata`, which errors on some servers.
`highlight`	boolean (optional)	Sends `highlight=true` / `false` to OpenAleph (Elasticsearch match snippets). Default: `true`.
`highlightCount`	number (optional)	`highlight_count` — max fragments per hit (server default often 3).
`highlightLength`	number (optional)	`highlight_length` — fragment size in characters.
`responseMode`	`"structured" \| "raw"` (optional)	Output mode. Default: `structured`.
`includeRaw`	boolean (optional)	In `structured` mode, wrap as `{ "results": [...], "raw": … }` (search) or `{ "result": {...}, "raw": … }` (get entity). Default: `false`.
`includeContentFields`	boolean (optional)	Include heavy content fields (`bodyHtml`, `bodyText`, `translatedText`). Default: `false`.
`contentPreviewChars`	number (optional)	When `> 0`, caps Email `bodyMarkdown` (after HTML→Markdown) or Pages `bodyText` (plain; overrides `bodyMarkdownMaxChars`). Default: `0`.
`bodyMarkdownMaxChars`	number (optional)	When `contentPreviewChars` is `0`, max length of Email `bodyMarkdown` or Pages `bodyText` (same boundary-aware truncation). Default: 200 (short for search context).
`maxArrayValuesPerField`	number (optional)	Max values returned per field in structured mode. Default: `20`.

Successful responses return JSON text (pretty-printed). In default structured mode, the tool returns a compact JSON array of hits (no Aleph status / total wrapper) to keep LLM context small.

Structured output shape (default)

The root value is a JSON array of slim hit objects:

[
  {
    "schema": "Email",
    "properties": {
      "subject": ["RE: Q4 planning notes"],
      "peopleMentioned": ["pat example", "quinn example"]
    },
    "dataset": "collection-id-or-null",
    "score": null,
    "id": "entity-id",
    "link": "https://your-instance.example.org/entities/entity-id"
  }
]

Each hit includes:

schema, properties, dataset (from dataset on the entity if present, else collection_id / collection.id), score (from score / _score when present, else null), id, link (only links.ui from Aleph; other link relations are omitted). Search responses may also include highlight (array of HTML snippet strings from Elasticsearch when highlight=true in the API)—passed through on each slim hit so you can see query terms in context when bodyText / bodyMarkdown is truncated.
properties drops processingAgent and processingStatus; compresses parent and ancestors to { schema, caption, id }; compresses recipients to { schema, id, name?, email? } with name / email lifted from nested FtM properties. For schema: "Email" (case-insensitive), when includeContentFields is false, bodyHtml is preprocessed (strip  comments, <style>, <script>, <head>, etc.—Outlook/Word often wraps CSS in comments so Turndown would otherwise emit that as “Markdown”), then converted with Turndown; the Markdown is shortened after conversion, preferring cuts at paragraph breaks (\n\n), then line breaks, then spaces. Very large HTML is trimmed at a safe tag boundary before Turndown only as a safety cap. If Aleph also sends bodyMarkdown, it is not passed through when bodyHtml is present (it often duplicates HTML). For schema: "Pages", when includeContentFields is false, plain bodyText is returned (normalized line endings) with the same boundary-aware shortening (not HTML/Markdown conversion). Use contentPreviewChars when > 0, otherwise bodyMarkdownMaxChars (search default 200, get-entity default 6000). When bodyMarkdown (Email) or truncated bodyText (Pages) is present, the slim hit also includes truncatedBody, bodyMarkdownFullChars (full length before the response cap—applies to Pages plain text too), and bodyMarkdownReturnedChars. Other heavy fields still follow the usual omit/preview rules. Remaining keys follow the same array-cap rules (maxArrayValuesPerField).
id is the canonical OpenAleph entity id. If the API omits top-level id, it is parsed from links.self (or links.ui).

Page / Pages body text (Aleph / FollowTheMoney): Page is a per-page shard (often filtered with properties.document + properties.index); Pages is the multi-page file. In Aleph’s entity index, extracted text is often stored as indexText and folded into Elasticsearch’s internal text field; that aggregate field is not stored in _source, and properties.bodyText may be empty even when search matches. This MCP treats Page and Pages the same for body text: it builds truncated properties.bodyText from, in order: bodyText, indexText, rawText, then (search only) joined highlight snippets. When the content came from highlight, the slim hit includes bodyTextFromSearchHighlight: true (preview snippets, not the full file). Use aleph_get_entity or aleph_get_entity_markdown when you need full text and the entity has bodyText / indexText in the API response.

With includeRaw: true, structured mode returns { "results": [ … ], "raw": <original Aleph search JSON> } so the slim array and the full API body can both be inspected.

Use responseMode: "raw" to receive the unmodified Aleph JSON payload (including status, total, and full links).

Errors use MCP isError with status and body when available.

Tool: `aleph_get_entity`

Fetches one entity by id using OpenAleph GET /api/2/entities/:id. Use this when you already have an entity id (from aleph_search, the web UI, or exports)—including documents, emails, people, and other FtM schemata.

Argument	Type	Description
`id`	string (required)	Entity id (URL-encoded on the wire if it contains special characters).
`responseMode`	`"structured" \| "raw"` (optional)	Default: `structured` (same shaping as search).
`includeRaw`	boolean (optional)	Include Aleph’s JSON under `raw` in structured mode. Default: `false`.
`includeContentFields`	boolean (optional)	Include heavy fields (`bodyHtml`, `bodyText`, …). Default: `false`.
`contentPreviewChars`	number (optional)	When `> 0`, caps Email `bodyMarkdown` or Pages `bodyText` (overrides `bodyMarkdownMaxChars`). Default: `0`.
`bodyMarkdownMaxChars`	number (optional)	When `contentPreviewChars` is `0`, max length of Email `bodyMarkdown` or Pages `bodyText`. Default: 6000 (longer than search for reading one entity).
`maxArrayValuesPerField`	number (optional)	Cap per field. Default: `20`.

Structured output shape (default)

The root value is a single slim entity object—the same shape as one element of the aleph_search array:

{
  "schema": "Pages",
  "properties": {
    "title": ["…"],
    "fileName": ["…"]
  },
  "dataset": "collection-id-or-null",
  "score": null,
  "id": "your-entity-id",
  "link": "https://your-instance.example.org/entities/your-entity-id"
}

With includeRaw: true, structured mode returns { "result": { … }, "raw": <original Aleph entity JSON> }.

Set includeContentFields: true when you need full document or email body text.

Tool: `aleph_get_entity_markdown`

Fetches one entity by id (GET /api/2/entities/:id) and returns full body text (same source as structured mode, but no length cap on the returned string except the internal safety limit on huge inputs). Email: bodyMarkdown (HTML→Markdown from bodyHtml). Pages: plain bodyText. Use this when truncatedBody is true on search/get-entity output or you need the complete text.

Argument	Type	Description
`id`	string (required)	Entity id (Email with usable `bodyHtml`, or Pages with `bodyText`).
`includeRaw`	boolean (optional)	Include Aleph’s JSON under `raw`. Default: `false`.

Successful responses are JSON with id, schema, and either Email fields (bodyMarkdown, bodyMarkdownFullChars, htmlSourceTruncated) or Pages fields (bodyText, bodyTextFullChars, htmlSourceTruncated). If there is no usable body, the tool returns MCP isError: true with an explanatory message that lists which property keys were present on the parent and the exact child-pages query that was tried — helpful for telling ingest gaps apart from access-scope problems.

Pages: automatic child-page aggregation

OpenAleph’s /api/2/entities/:id handler for a single entity sets excludes = ["text", "numeric.*"], so the parent of a paginated Pages document almost always has an empty properties.bodyText. FollowTheMoney keeps per-page text on child Page entities (Page:bodyText, Page:index, Page:document → <parent_id>).

When the parent has no own bodyText / indexText / rawText, this tool transparently issues:

GET /api/2/search?q=*&filter:schema=Page&filter:properties.document=<id>&limit=500

sorts the returned children by properties.index, and concatenates their bodyText (falling back to indexText / rawText per child). When that path is taken the response adds:

bodyTextFromChildren: true
childPageCount: number of pages concatenated

HTTP filters are used (not a Lucene q: clause) because properties.document is analyzed/tokenized — phrase-matching the dotted child id against it does not work; exact filter-term matching does.

API references

Security notes

Treat the API key like a password: use Cursor env or your OS secret store; avoid committing keys.
Stdio MCP assumes the client (Cursor) is trusted; do not expose this process to untrusted callers.

Troubleshooting

500 on search with schemata: Older setups used filter:schemata; this MCP now adds schemata:YourSchema inside q instead (same idea as Advanced Search). Ensure the schema name exists in your FtM model (e.g. Pages, Person, Email).
"No results" when searching multiple schemata: Don't pass "Email,Pages" expecting it to match as a single schema — there is no such schema. Use the supported forms: "schemata": ["Email","Pages"] or "schemata": "Email,Pages" / "Email Pages"; both are OR-merged into q as (schemata:Email OR schemata:Pages).
Empty bodyText on a Pages entity from aleph_get_entity: Expected — OpenAleph's /api/2/entities/:id handler uses excludes = ["text", "numeric.*"], so the indexed text is stripped from the single-entity response. Call aleph_get_entity_markdown instead; it automatically aggregates the child Page entities (filter:schema=Page&filter:properties.document=<id>) and sets bodyTextFromChildren: true + childPageCount on the response. If that also comes back empty, the error lists which property keys were present on the parent and the exact child query that was tried — useful for distinguishing an ingest/OCR gap from an access-scope problem on the child pages' collection.
401 / 403: Invalid or expired API key, or role cannot browse/search the requested data.
408 from tool: Request timed out; increase ALEPH_REQUEST_TIMEOUT_MS or narrow the query.
URL issues: Only the origin of ALEPH_BASE_URL / OPAL_HOST is used; trailing paths are stripped.

Development

npm test           # unit tests (vitest)
npm run lint       # eslint + typescript-eslint
npm run build      # type-check + emit to dist/

End-to-end (real OpenAleph)

All e2e commands load .env via test/e2e/setup-env.ts.

Command	What it does
`npm run test:e2e`	Runs all e2e tests — includes the full search flow (and the targeted entity test when `ALEPH_E2E_ENTITY_ID` is set).
`npm run test:e2e:entity`	Runs only the targeted entity test (`test/e2e/aleph-entity.e2e.test.ts`); skips cleanly when `ALEPH_E2E_ENTITY_ID` is unset.
`npm run e2e:entity -- <id>`	Standalone CLI: fetches one entity by id via `aleph_get_entity` and `aleph_get_entity_markdown`, printing JSON to stdout. Flags: `--raw`, `--markdown` / `--no-markdown`, `-h`. Id on the CLI wins over `ALEPH_E2E_ENTITY_ID`.

Search tuning — `ALEPH_E2E_SEARCH_*`

Control the HTTP query issued to /api/2/search:

ALEPH_E2E_SEARCH_Q (default test), ALEPH_E2E_SEARCH_LIMIT (default 5), ALEPH_E2E_SEARCH_OFFSET, ALEPH_E2E_SEARCH_COLLECTION_ID
ALEPH_E2E_SEARCH_SCHEMA / ALEPH_E2E_SEARCH_SCHEMATA (single name, or comma/space-separated list — OR-combined)
ALEPH_E2E_SEARCH_FACETS (comma list), ALEPH_E2E_SEARCH_EXTRA_FILTERS (JSON object for additional filter:* pairs)
ALEPH_E2E_SEARCH_HIGHLIGHT (true/false, default true), ALEPH_E2E_SEARCH_HIGHLIGHT_COUNT, ALEPH_E2E_SEARCH_HIGHLIGHT_LENGTH
ALEPH_E2E_FETCH_TOP_N (default 2, max 50) — after search, GET /api/2/entities/:id for this many hits so logs include both search and per-entity responses

Control the structured-response shaping of runAlephSearchTool (applied in addition to the raw client.search call, so the log has both):

ALEPH_E2E_SEARCH_RESPONSE_MODE (structured | raw)
ALEPH_E2E_SEARCH_INCLUDE_RAW, ALEPH_E2E_SEARCH_INCLUDE_CONTENT_FIELDS
ALEPH_E2E_SEARCH_CONTENT_PREVIEW_CHARS, ALEPH_E2E_SEARCH_BODY_MARKDOWN_MAX_CHARS, ALEPH_E2E_SEARCH_MAX_ARRAY_VALUES_PER_FIELD

The e2e log header prints both the resolved search parameters and the shaping args actually applied.

Targeted entity — `ALEPH_E2E_ENTITY_*`

For reproducing a problem against one specific document:

ALEPH_E2E_ENTITY_ID — required to enable npm run test:e2e:entity; ignored (overridden) if you pass an id to npm run e2e:entity --.
ALEPH_E2E_ENTITY_FETCH_MARKDOWN (default true) — also run aleph_get_entity_markdown after aleph_get_entity.
ALEPH_E2E_ENTITY_RESPONSE_MODE, ALEPH_E2E_ENTITY_INCLUDE_RAW, ALEPH_E2E_ENTITY_INCLUDE_CONTENT_FIELDS, ALEPH_E2E_ENTITY_CONTENT_PREVIEW_CHARS, ALEPH_E2E_ENTITY_BODY_MARKDOWN_MAX_CHARS, ALEPH_E2E_ENTITY_MAX_ARRAY_VALUES_PER_FIELD — same semantics as their SEARCH_ counterparts, applied to the entity tool.

Example:

# Reproduce an issue against one document, including its full body text:
npm run e2e:entity -- 0000000000000000000000000000000000000000.1111111111111111111111111111111111111111

# Or pin it in .env and run as a test:
ALEPH_E2E_ENTITY_ID=<id> npm run test:e2e:entity

Privacy: Log files under logs/ can contain API responses with sensitive content. They are listed in .cursorignore; do not commit them or paste them into shared chats.

Prompts for AI assistants

Ready-to-use system prompts (short, detailed, investigative, workflows, advanced search crib) live in prompts/. Start at prompts/README.md.

License

MIT

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured

barracuda-mcp

README

barracuda-mcp

Requirements

Install

As an npm package (recommended)

From source

Configuration

Run locally

MCP client setup

LM Studio (recommended)

Cursor

Cursor from a local source build

Claude Desktop / other stdio MCP clients

Tool: aleph_search

Structured output shape (default)

Tool: aleph_get_entity

Structured output shape (default)

Tool: aleph_get_entity_markdown

Pages: automatic child-page aggregation

API references

Security notes

Troubleshooting

Development

End-to-end (real OpenAleph)

Search tuning — ALEPH_E2E_SEARCH_*

Targeted entity — ALEPH_E2E_ENTITY_*

Prompts for AI assistants

License

Recommended Servers

Tool: `aleph_search`

Tool: `aleph_get_entity`

Tool: `aleph_get_entity_markdown`

Search tuning — `ALEPH_E2E_SEARCH_*`

Targeted entity — `ALEPH_E2E_ENTITY_*`