barracuda-mcp
Enables searching OpenAleph entities and documents using natural language queries through the MCP protocol.
README
barracuda-mcp
A Model Context Protocol server that exposes OpenAleph entity and document search via the official HTTP API (GET /api/2/search). It is intended for use from LM Studio (recommended), Cursor, Claude Desktop, and other MCP clients over stdio.
After clicking Add to LM Studio, update
ALEPH_BASE_URLandALEPH_API_KEYin LM Studio's MCP settings.
Requirements
- Node.js 20+
- An OpenAleph instance with a valid API key (see OpenAleph Python guide for
OPAL_HOST/OPAL_API_KEYconventions)
Install
As an npm package (recommended)
Run the latest published version directly with npx (no install needed):
npx -y barracuda-mcp
Or install globally:
npm install -g barracuda-mcp
barracuda-mcp
Or add it as a dependency to another project:
npm install barracuda-mcp
The package ships with a barracuda-mcp executable (defined in the bin field) that speaks MCP over stdio.
From source
git clone https://github.com/Sourcery-info/barracuda-mcp.git
cd barracuda-mcp
npm ci
npm run build
The built entrypoint is dist/index.js.
Configuration
The server reads configuration from environment variables. This package does not load a .env file automatically — export the variables in your shell, process manager, or MCP client's env block.
| Variable | Description |
|---|---|
ALEPH_BASE_URL |
Origin of your OpenAleph instance (e.g. https://aleph.example.org). Preferred if set. |
OPAL_HOST |
Same role as ALEPH_BASE_URL if the latter is unset. Path components are ignored; only the origin is used. |
ALEPH_API_KEY |
API key. Preferred if set. |
OPAL_API_KEY |
Used if ALEPH_API_KEY is unset. |
ALEPH_REQUEST_TIMEOUT_MS |
Optional. Request timeout in milliseconds (default 60000, clamped between 1000 and 600000). |
ALEPH_SESSION_ID |
Optional. Sent as X-Aleph-Session; defaults to a random UUID per process. |
Precedence: ALEPH_BASE_URL over OPAL_HOST; ALEPH_API_KEY over OPAL_API_KEY.
Authentication: Requests use Authorization: ApiKey <your_key>, consistent with the openaleph-client library.
Run locally
From an npm install:
export ALEPH_BASE_URL=https://your-instance.example.org
export ALEPH_API_KEY=your_key
npx -y barracuda-mcp
From source:
export ALEPH_BASE_URL=https://your-instance.example.org
export ALEPH_API_KEY=your_key
npm start
The server speaks MCP over stdio (stdin/stdout). Do not run it in a terminal you expect to use interactively for other output.
MCP client setup
LM Studio (recommended)
The easiest way to get started: click the Add to LM Studio button at the top of this README for one-click install.
After clicking, open LM Studio → Program → MCP and replace the placeholder ALEPH_BASE_URL and ALEPH_API_KEY values with your own.
To add it manually instead:
{
"mcpServers": {
"barracuda-mcp": {
"command": "npx",
"args": ["-y", "barracuda-mcp"],
"env": {
"ALEPH_BASE_URL": "https://your-instance.example.org",
"ALEPH_API_KEY": "your_api_key_here"
}
}
}
}
See the LM Studio MCP docs for more.
Cursor
Add a server entry in Cursor Settings → MCP:
{
"mcpServers": {
"openaleph": {
"command": "npx",
"args": ["-y", "barracuda-mcp"],
"env": {
"ALEPH_BASE_URL": "https://your-instance.example.org",
"ALEPH_API_KEY": "your_api_key_here"
}
}
}
}
Cursor from a local source build
If you've cloned and built from source, point Cursor at the built JS:
{
"mcpServers": {
"openaleph": {
"command": "node",
"args": ["/absolute/path/to/barracuda-mcp/dist/index.js"],
"env": {
"ALEPH_BASE_URL": "https://your-instance.example.org",
"ALEPH_API_KEY": "your_api_key_here"
}
}
}
}
For development without running npm run build, you can use tsx:
{
"mcpServers": {
"openaleph-dev": {
"command": "npx",
"args": ["tsx", "/absolute/path/to/barracuda-mcp/src/index.ts"],
"env": {
"ALEPH_BASE_URL": "https://your-instance.example.org",
"ALEPH_API_KEY": "your_api_key_here"
}
}
}
}
Claude Desktop / other stdio MCP clients
Any MCP client that supports stdio works — use the same command / args / env shape as above.
Tool: aleph_search
Searches OpenAleph using the same query model as the REST API (Elasticsearch-style q, pagination, facets, and filter:* parameters). The tool does not add default schemata filters unless you pass schema or schemata.
| Argument | Type | Description |
|---|---|---|
q |
string (required) | Search query (e.g. plain text or field:value). |
limit |
number (optional) | Page size, max 10000 (server-enforced cap). |
offset |
number (optional) | Results to skip. |
collectionId |
string (optional) | Sets filter:collection_id. |
facets |
string[] (optional) | Repeated facet query params (e.g. languages, countries). |
extraFilters |
object (optional) | Map of filter name → value; each becomes filter:{name}. |
schema |
string | string[] (optional) | Exact-schema filter. A single value becomes filter:schema=X. An array (["Email","Pages"]) or comma/space-separated string ("Email,Pages") is OR-merged into q as (schema:A OR schema:B) because filter:schema only accepts one value. |
schemata |
string | string[] (optional) | Schema-with-descendants filter. Single / array / comma-separated are all accepted. Appends schemata:X (or (schemata:A OR schemata:B …)) to q — never sent as filter:schemata, which errors on some servers. |
highlight |
boolean (optional) | Sends highlight=true / false to OpenAleph (Elasticsearch match snippets). Default: true. |
highlightCount |
number (optional) | highlight_count — max fragments per hit (server default often 3). |
highlightLength |
number (optional) | highlight_length — fragment size in characters. |
responseMode |
"structured" | "raw" (optional) |
Output mode. Default: structured. |
includeRaw |
boolean (optional) | In structured mode, wrap as { "results": [...], "raw": … } (search) or { "result": {...}, "raw": … } (get entity). Default: false. |
includeContentFields |
boolean (optional) | Include heavy content fields (bodyHtml, bodyText, translatedText). Default: false. |
contentPreviewChars |
number (optional) | When > 0, caps Email bodyMarkdown (after HTML→Markdown) or Pages bodyText (plain; overrides bodyMarkdownMaxChars). Default: 0. |
bodyMarkdownMaxChars |
number (optional) | When contentPreviewChars is 0, max length of Email bodyMarkdown or Pages bodyText (same boundary-aware truncation). Default: 200 (short for search context). |
maxArrayValuesPerField |
number (optional) | Max values returned per field in structured mode. Default: 20. |
Successful responses return JSON text (pretty-printed). In default structured mode, the tool returns a compact JSON array of hits (no Aleph status / total wrapper) to keep LLM context small.
Structured output shape (default)
The root value is a JSON array of slim hit objects:
[
{
"schema": "Email",
"properties": {
"subject": ["RE: Q4 planning notes"],
"peopleMentioned": ["pat example", "quinn example"]
},
"dataset": "collection-id-or-null",
"score": null,
"id": "entity-id",
"link": "https://your-instance.example.org/entities/entity-id"
}
]
Each hit includes:
schema,properties,dataset(fromdataseton the entity if present, elsecollection_id/collection.id),score(fromscore/_scorewhen present, elsenull),id,link(onlylinks.uifrom Aleph; other link relations are omitted). Search responses may also includehighlight(array of HTML snippet strings from Elasticsearch whenhighlight=truein the API)—passed through on each slim hit so you can see query terms in context whenbodyText/bodyMarkdownis truncated.propertiesdropsprocessingAgentandprocessingStatus; compressesparentandancestorsto{ schema, caption, id }; compressesrecipientsto{ schema, id, name?, email? }withname/emaillifted from nested FtM properties. Forschema: "Email"(case-insensitive), whenincludeContentFieldsisfalse,bodyHtmlis preprocessed (strip<!-- … -->comments,<style>,<script>,<head>, etc.—Outlook/Word often wraps CSS in comments so Turndown would otherwise emit that as “Markdown”), then converted with Turndown; the Markdown is shortened after conversion, preferring cuts at paragraph breaks (\n\n), then line breaks, then spaces. Very large HTML is trimmed at a safe tag boundary before Turndown only as a safety cap. If Aleph also sendsbodyMarkdown, it is not passed through whenbodyHtmlis present (it often duplicates HTML). Forschema: "Pages", whenincludeContentFieldsisfalse, plainbodyTextis returned (normalized line endings) with the same boundary-aware shortening (not HTML/Markdown conversion). UsecontentPreviewCharswhen> 0, otherwisebodyMarkdownMaxChars(search default 200, get-entity default 6000). WhenbodyMarkdown(Email) or truncatedbodyText(Pages) is present, the slim hit also includestruncatedBody,bodyMarkdownFullChars(full length before the response cap—applies to Pages plain text too), andbodyMarkdownReturnedChars. Other heavy fields still follow the usual omit/preview rules. Remaining keys follow the same array-cap rules (maxArrayValuesPerField).idis the canonical OpenAleph entity id. If the API omits top-levelid, it is parsed fromlinks.self(orlinks.ui).
Page / Pages body text (Aleph / FollowTheMoney): Page is a per-page shard (often filtered with properties.document + properties.index); Pages is the multi-page file. In Aleph’s entity index, extracted text is often stored as indexText and folded into Elasticsearch’s internal text field; that aggregate field is not stored in _source, and properties.bodyText may be empty even when search matches. This MCP treats Page and Pages the same for body text: it builds truncated properties.bodyText from, in order: bodyText, indexText, rawText, then (search only) joined highlight snippets. When the content came from highlight, the slim hit includes bodyTextFromSearchHighlight: true (preview snippets, not the full file). Use aleph_get_entity or aleph_get_entity_markdown when you need full text and the entity has bodyText / indexText in the API response.
With includeRaw: true, structured mode returns { "results": [ … ], "raw": <original Aleph search JSON> } so the slim array and the full API body can both be inspected.
Use responseMode: "raw" to receive the unmodified Aleph JSON payload (including status, total, and full links).
Errors use MCP isError with status and body when available.
Tool: aleph_get_entity
Fetches one entity by id using OpenAleph GET /api/2/entities/:id. Use this when you already have an entity id (from aleph_search, the web UI, or exports)—including documents, emails, people, and other FtM schemata.
| Argument | Type | Description |
|---|---|---|
id |
string (required) | Entity id (URL-encoded on the wire if it contains special characters). |
responseMode |
"structured" | "raw" (optional) |
Default: structured (same shaping as search). |
includeRaw |
boolean (optional) | Include Aleph’s JSON under raw in structured mode. Default: false. |
includeContentFields |
boolean (optional) | Include heavy fields (bodyHtml, bodyText, …). Default: false. |
contentPreviewChars |
number (optional) | When > 0, caps Email bodyMarkdown or Pages bodyText (overrides bodyMarkdownMaxChars). Default: 0. |
bodyMarkdownMaxChars |
number (optional) | When contentPreviewChars is 0, max length of Email bodyMarkdown or Pages bodyText. Default: 6000 (longer than search for reading one entity). |
maxArrayValuesPerField |
number (optional) | Cap per field. Default: 20. |
Structured output shape (default)
The root value is a single slim entity object—the same shape as one element of the aleph_search array:
{
"schema": "Pages",
"properties": {
"title": ["…"],
"fileName": ["…"]
},
"dataset": "collection-id-or-null",
"score": null,
"id": "your-entity-id",
"link": "https://your-instance.example.org/entities/your-entity-id"
}
With includeRaw: true, structured mode returns { "result": { … }, "raw": <original Aleph entity JSON> }.
Set includeContentFields: true when you need full document or email body text.
Tool: aleph_get_entity_markdown
Fetches one entity by id (GET /api/2/entities/:id) and returns full body text (same source as structured mode, but no length cap on the returned string except the internal safety limit on huge inputs). Email: bodyMarkdown (HTML→Markdown from bodyHtml). Pages: plain bodyText. Use this when truncatedBody is true on search/get-entity output or you need the complete text.
| Argument | Type | Description |
|---|---|---|
id |
string (required) | Entity id (Email with usable bodyHtml, or Pages with bodyText). |
includeRaw |
boolean (optional) | Include Aleph’s JSON under raw. Default: false. |
Successful responses are JSON with id, schema, and either Email fields (bodyMarkdown, bodyMarkdownFullChars, htmlSourceTruncated) or Pages fields (bodyText, bodyTextFullChars, htmlSourceTruncated). If there is no usable body, the tool returns MCP isError: true with an explanatory message that lists which property keys were present on the parent and the exact child-pages query that was tried — helpful for telling ingest gaps apart from access-scope problems.
Pages: automatic child-page aggregation
OpenAleph’s /api/2/entities/:id handler for a single entity sets excludes = ["text", "numeric.*"], so the parent of a paginated Pages document almost always has an empty properties.bodyText. FollowTheMoney keeps per-page text on child Page entities (Page:bodyText, Page:index, Page:document → <parent_id>).
When the parent has no own bodyText / indexText / rawText, this tool transparently issues:
GET /api/2/search?q=*&filter:schema=Page&filter:properties.document=<id>&limit=500
sorts the returned children by properties.index, and concatenates their bodyText (falling back to indexText / rawText per child). When that path is taken the response adds:
bodyTextFromChildren:truechildPageCount: number of pages concatenated
HTTP filters are used (not a Lucene q: clause) because properties.document is analyzed/tokenized — phrase-matching the dotted child id against it does not work; exact filter-term matching does.
API references
Security notes
- Treat the API key like a password: use Cursor
envor your OS secret store; avoid committing keys. - Stdio MCP assumes the client (Cursor) is trusted; do not expose this process to untrusted callers.
Troubleshooting
- 500 on search with
schemata: Older setups usedfilter:schemata; this MCP now addsschemata:YourSchemainsideqinstead (same idea as Advanced Search). Ensure the schema name exists in your FtM model (e.g.Pages,Person,Email). - "No results" when searching multiple schemata: Don't pass
"Email,Pages"expecting it to match as a single schema — there is no such schema. Use the supported forms:"schemata": ["Email","Pages"]or"schemata": "Email,Pages"/"Email Pages"; both are OR-merged intoqas(schemata:Email OR schemata:Pages). - Empty
bodyTexton aPagesentity fromaleph_get_entity: Expected — OpenAleph's/api/2/entities/:idhandler usesexcludes = ["text", "numeric.*"], so the indexed text is stripped from the single-entity response. Callaleph_get_entity_markdowninstead; it automatically aggregates the childPageentities (filter:schema=Page&filter:properties.document=<id>) and setsbodyTextFromChildren: true+childPageCounton the response. If that also comes back empty, the error lists which property keys were present on the parent and the exact child query that was tried — useful for distinguishing an ingest/OCR gap from an access-scope problem on the child pages' collection. - 401 / 403: Invalid or expired API key, or role cannot browse/search the requested data.
- 408 from tool: Request timed out; increase
ALEPH_REQUEST_TIMEOUT_MSor narrow the query. - URL issues: Only the origin of
ALEPH_BASE_URL/OPAL_HOSTis used; trailing paths are stripped.
Development
npm test # unit tests (vitest)
npm run lint # eslint + typescript-eslint
npm run build # type-check + emit to dist/
End-to-end (real OpenAleph)
All e2e commands load .env via test/e2e/setup-env.ts.
| Command | What it does |
|---|---|
npm run test:e2e |
Runs all e2e tests — includes the full search flow (and the targeted entity test when ALEPH_E2E_ENTITY_ID is set). |
npm run test:e2e:entity |
Runs only the targeted entity test (test/e2e/aleph-entity.e2e.test.ts); skips cleanly when ALEPH_E2E_ENTITY_ID is unset. |
npm run e2e:entity -- <id> |
Standalone CLI: fetches one entity by id via aleph_get_entity and aleph_get_entity_markdown, printing JSON to stdout. Flags: --raw, --markdown / --no-markdown, -h. Id on the CLI wins over ALEPH_E2E_ENTITY_ID. |
Search tuning — ALEPH_E2E_SEARCH_*
Control the HTTP query issued to /api/2/search:
ALEPH_E2E_SEARCH_Q(defaulttest),ALEPH_E2E_SEARCH_LIMIT(default5),ALEPH_E2E_SEARCH_OFFSET,ALEPH_E2E_SEARCH_COLLECTION_IDALEPH_E2E_SEARCH_SCHEMA/ALEPH_E2E_SEARCH_SCHEMATA(single name, or comma/space-separated list — OR-combined)ALEPH_E2E_SEARCH_FACETS(comma list),ALEPH_E2E_SEARCH_EXTRA_FILTERS(JSON object for additionalfilter:*pairs)ALEPH_E2E_SEARCH_HIGHLIGHT(true/false, defaulttrue),ALEPH_E2E_SEARCH_HIGHLIGHT_COUNT,ALEPH_E2E_SEARCH_HIGHLIGHT_LENGTHALEPH_E2E_FETCH_TOP_N(default2, max50) — after search,GET /api/2/entities/:idfor this many hits so logs include both search and per-entity responses
Control the structured-response shaping of runAlephSearchTool (applied in addition to the raw client.search call, so the log has both):
ALEPH_E2E_SEARCH_RESPONSE_MODE(structured|raw)ALEPH_E2E_SEARCH_INCLUDE_RAW,ALEPH_E2E_SEARCH_INCLUDE_CONTENT_FIELDSALEPH_E2E_SEARCH_CONTENT_PREVIEW_CHARS,ALEPH_E2E_SEARCH_BODY_MARKDOWN_MAX_CHARS,ALEPH_E2E_SEARCH_MAX_ARRAY_VALUES_PER_FIELD
The e2e log header prints both the resolved search parameters and the shaping args actually applied.
Targeted entity — ALEPH_E2E_ENTITY_*
For reproducing a problem against one specific document:
ALEPH_E2E_ENTITY_ID— required to enablenpm run test:e2e:entity; ignored (overridden) if you pass an id tonpm run e2e:entity --.ALEPH_E2E_ENTITY_FETCH_MARKDOWN(defaulttrue) — also runaleph_get_entity_markdownafteraleph_get_entity.ALEPH_E2E_ENTITY_RESPONSE_MODE,ALEPH_E2E_ENTITY_INCLUDE_RAW,ALEPH_E2E_ENTITY_INCLUDE_CONTENT_FIELDS,ALEPH_E2E_ENTITY_CONTENT_PREVIEW_CHARS,ALEPH_E2E_ENTITY_BODY_MARKDOWN_MAX_CHARS,ALEPH_E2E_ENTITY_MAX_ARRAY_VALUES_PER_FIELD— same semantics as theirSEARCH_counterparts, applied to the entity tool.
Example:
# Reproduce an issue against one document, including its full body text:
npm run e2e:entity -- 0000000000000000000000000000000000000000.1111111111111111111111111111111111111111
# Or pin it in .env and run as a test:
ALEPH_E2E_ENTITY_ID=<id> npm run test:e2e:entity
Privacy: Log files under logs/ can contain API responses with sensitive content. They are listed in .cursorignore; do not commit them or paste them into shared chats.
Prompts for AI assistants
Ready-to-use system prompts (short, detailed, investigative, workflows, advanced search crib) live in prompts/. Start at prompts/README.md.
License
MIT
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.