Alexandria MCP
Provides access to 61 public digital libraries through a single unified interface, enabling users to search and retrieve information from academic papers, books, legal records, and more using natural language.
README
Alexandria
A Model Context Protocol (MCP) server for querying, reading, and ingesting texts from 61 public digital libraries. Works with any MCP-compatible client (Claude Desktop, Cursor, VS Code Copilot, etc.).
Tools
| Tool | Description |
|---|---|
library_list_sources |
List all 61 sources with descriptions and full-text capabilities |
library_ask(query, max_sources?, results_per_source?) |
Natural language search — routes your query to the best sources, searches in parallel, returns unified deduplicated results |
library_search(query, source, limit?) |
Search a specific source by title, author, or keywords |
library_read(id, source) |
Fetch full text or metadata for an item (200k char limit) |
library_index(id, source) |
Dry run: chunk and score text quality without writing anything |
library_ingest(id, source) |
Chunk → embed → store in your vector database. Idempotent. |
library_recommend(id, limit?) |
Get similar papers via Semantic Scholar's recommendation engine (up to 500) |
library_ask is the primary entry point. library_search is for targeted queries against a known source. library_index / library_ingest are for building a vector knowledge base from retrieved texts.
Sources (61)
Public Domain Literature (29)
| Source | Coverage | Full Text |
|---|---|---|
gutenberg |
76k+ public domain books | Yes |
openlibrary |
30M+ records | Metadata only |
archive |
41M+ texts, newspapers, scanned books | Yes |
sacredtexts |
Curated registry: Quran, Sufi corpus, Vedanta, Buddhism, Taoism, Hermeticism, Christian mysticism | Yes (scraped) |
wikisource |
Free-content library: historical documents, literary works | Yes |
standardebooks |
Carefully formatted, public domain ebooks | Yes |
perseus |
Classical Greek and Latin texts with translations | Yes |
ctext |
Chinese Text Project — pre-modern Chinese literature | Yes |
gallica |
Bibliothèque nationale de France — French heritage texts | Yes |
loc |
Library of Congress — US historical collections | Metadata only |
hathitrust |
17M+ volumes from research libraries | Metadata only |
dpla |
Digital Public Library of America — US cultural heritage | Metadata only |
ndl |
National Diet Library Japan | Metadata only |
europeana |
European cultural heritage — 50M+ objects | Metadata only |
trove |
National Library of Australia — newspapers, books, images | Yes |
bhl |
Biodiversity Heritage Library — natural history literature | Yes |
digitalnz |
National Library of New Zealand | Metadata only |
internetclassics |
Internet Classics Archive — 441 classical works | Yes |
marxists |
Marxists Internet Archive — political theory, philosophy | Yes |
projectruneberg |
Nordic literature and history | Yes |
cervantes |
Biblioteca Virtual Miguel de Cervantes — Spanish literature | Yes |
doab |
Directory of Open Access Books — 70k+ peer-reviewed OA books | Metadata only |
oapen |
Open Access Publishing in European Networks — humanities & social sciences | Yes |
googlebooks |
Google Books — metadata and preview snippets | Metadata only |
chroniclingamerica |
Library of Congress — US historic newspapers 1770–1963 | Yes |
ccel |
Christian Classics Ethereal Library | Yes |
feedbooks |
Public domain and self-published ebooks | Yes |
wdl |
World Digital Library — international manuscripts and maps | Metadata only |
datagov |
Data.gov — US government open data catalog | Metadata only |
Academic & Science (11)
| Source | Coverage | Full Text |
|---|---|---|
arxiv |
2M+ preprints: physics, math, CS, biology, economics | Yes |
core |
57M+ open access research papers across all disciplines | Yes |
europmc |
Europe PubMed Central — life sciences literature | Yes |
nasa |
NASA Technical Reports Server | Yes |
osti |
DOE Office of Scientific and Technical Information | Yes |
eric |
Education Resources Information Center | Yes |
nsf |
NSF Award Search — funded research abstracts | Yes |
courtlistener |
US federal and state court opinions (Free Law Project). 125 req/day. | Yes |
biorxiv |
bioRxiv preprints — biology | Yes |
zenodo |
CERN open repository — papers, datasets, software. 2M+ records. | Yes |
semanticscholar |
Semantic Scholar — 200M+ papers with AI-powered metadata | Yes |
Government, Law & International (5)
| Source | Coverage | Full Text |
|---|---|---|
govinfo |
US Government Publishing Office — laws, regulations, congressional records | Yes |
nih |
NIH Office of Portfolio Analysis | Yes |
nbnorway |
National Library of Norway | Metadata only |
legislation |
legislation.gov.uk — UK Acts and Statutory Instruments | Yes |
osf |
Open Science Framework — preprints and research data | Yes |
Specialized Corpora (3)
| Source | Coverage | Full Text |
|---|---|---|
earlyprint |
Early English print 1473–1700 | Yes |
openiti |
OpenITI — Arabic/Persian Islamic texts (GitHub-based) | Yes |
legislationscot |
Scottish legislation | Yes |
Research Aggregators (8)
| Source | Coverage | Full Text |
|---|---|---|
openalex |
OpenAlex — 240M+ scholarly works, open catalog | Metadata only |
plos |
PLOS journals — open access science | Yes |
crossref |
Crossref — 150M+ DOI metadata records | Metadata only |
nasaads |
NASA Astrophysics Data System | Yes |
smithsonian |
Smithsonian Institution — collections and research | Metadata only |
doaj |
Directory of Open Access Journals — 20k+ journals | Metadata only |
nara |
National Archives — US federal records | Metadata only |
springer |
SpringerNature — OA and metadata | Metadata only |
Institutional Repositories (4)
| Source | Coverage | Full Text |
|---|---|---|
harvardlib |
Harvard Library Digital Collections | Metadata only |
apollo |
Cambridge University repository | Yes |
ora |
Oxford Research Archive | Yes |
base |
Bielefeld Academic Search Engine — 300M+ documents (pending IP whitelist) | Metadata only |
Software Documentation (1)
| Source | Coverage | Full Text |
|---|---|---|
codewiki |
Google Code Wiki — open source project documentation | Yes |
Credentials
Most tools query external library APIs directly and need no credentials at all. The two optional dependencies are scoped to specific tools:
OpenAI — optional (platform.openai.com)
Required by two tools only:
library_ask— usesgpt-4o-minito route your natural language query to the right sources and generate optimized per-source search terms. Without this key, uselibrary_searchto query sources directly.library_ingest— usestext-embedding-3-smallto embed chunked text before writing to the vector store.
library_list_sources, library_search, library_read, library_index, and library_recommend all work without an OpenAI key.
Supabase — optional (supabase.com)
Required by one tool only:
library_ingest— writes chunked, embedded text into a pgvector table for semantic search. Without this, retrieved texts stay in-context and are not persisted anywhere.
Everything else — searching, reading, browsing, getting recommendations — queries external sources in real time and needs no database.
Source-specific keys
Some sources require their own API key. These are free registrations. Sources without a key listed here work without any credentials.
| Env Var | Source(s) | Get It |
|---|---|---|
CORE_API_KEY |
core |
core.ac.uk/services/api |
COURTLISTENER_API_KEY |
courtlistener |
courtlistener.com/profile/tokens |
GOVINFO_API_KEY |
govinfo, smithsonian |
api.data.gov/signup — one key covers both |
GOOGLE_BOOKS_API_KEY |
googlebooks |
Google Cloud Console → APIs & Services → Books API |
BHL_API_KEY |
bhl |
biodiversitylibrary.org/getapikey |
DIGITALNZ_API_KEY |
digitalnz |
digitalnz.org/developers |
DPLA_API_KEY |
dpla |
pro.dp.la/developers/api-codex |
EUROPEANA_API_KEY |
europeana |
apis.europeana.eu — test key immediate, personal ~1 week |
GITHUB_TOKEN |
openiti |
github.com/settings/tokens — public repo read scope, optional but prevents rate limiting |
NASA_ADS_API_KEY |
nasaads |
ui.adsabs.harvard.edu/user/settings/token |
SPRINGER_OA_API_KEY + SPRINGER_META_API_KEY |
springer |
dev.springernature.com — same registration, two keys |
ZENODO_API_KEY |
zenodo |
zenodo.org/account/settings/applications/tokens/new — optional, increases rate limits |
SEMANTIC_SCHOLAR_API_KEY |
semanticscholar |
semanticscholar.org/product/api — optional, increases rate limits |
TROVE_API_KEY |
trove |
trove.nla.gov.au/about/create-something/using-api — ~1 week approval |
BASE_API_KEY |
base |
base-search.net/about/en/contact — requires IP whitelist |
Setup
git clone https://github.com/suavecito585/alexandria-mcp
cd alexandria-mcp
npm install
npm run build
Copy .env.example to .env. Minimum configuration to run with no credentials (search and read only):
TRANSPORT=stdio
To enable library_ask:
TRANSPORT=stdio
OPENAI_API_KEY=sk-...
To enable library_ingest:
TRANSPORT=stdio
OPENAI_API_KEY=sk-...
SUPABASE_URL=https://your-project.supabase.co
SUPABASE_SERVICE_ROLE_KEY=eyJ...
Supabase Schema
Required only if using library_ingest:
create table if not exists knowledge_chunks (
id bigserial primary key,
content text not null,
embedding vector(1536),
mcp_name text,
metadata jsonb,
created_at timestamptz default now()
);
create table if not exists source_docs (
id bigserial primary key,
source_url text not null,
mcp_name text not null,
title text,
source text,
chunk_count int,
indexed_at timestamptz,
unique (source_url, mcp_name)
);
create index if not exists knowledge_chunks_embedding_idx
on knowledge_chunks using ivfflat (embedding vector_cosine_ops)
with (lists = 100);
Claude Desktop (stdio)
Minimum config (search and read only):
{
"mcpServers": {
"library": {
"command": "node",
"args": ["/path/to/alexandria-mcp/dist/index.js"],
"env": {
"TRANSPORT": "stdio"
}
}
}
}
With library_ask and library_ingest enabled:
{
"mcpServers": {
"library": {
"command": "node",
"args": ["/path/to/alexandria-mcp/dist/index.js"],
"env": {
"TRANSPORT": "stdio",
"OPENAI_API_KEY": "sk-...",
"SUPABASE_URL": "https://your-project.supabase.co",
"SUPABASE_SERVICE_ROLE_KEY": "eyJ..."
}
}
}
}
Railway (HTTP)
Set env vars in the Railway dashboard and deploy:
railway up
Register in Claude Desktop:
{
"mcpServers": {
"library": {
"url": "https://your-service.up.railway.app/mcp"
}
}
}
Health check: GET /health returns { status: "ok", sources: 61 }.
Adding Custom Providers
The pipeline is provider-agnostic. To add a new embedding model or vector store:
- Implement
EmbeddingProviderorVectorStoreProviderfromsrc/types.ts - Add your implementation to
src/pipeline/providers/ - Register it in
src/pipeline/providers/index.ts - Set
EMBEDDING_PROVIDERorVECTOR_STORE_PROVIDERin your env
// Example: Ollama embedding provider
import type { EmbeddingProvider } from '../../types.js';
export class OllamaEmbeddingProvider implements EmbeddingProvider {
readonly dimensions = 768;
async embed(texts: string[]): Promise<number[][]> {
// your implementation
}
}
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.