ResearchTwin
For AI agents and humans: Discover researchers, publications, datasets, and code repositories across a federated network of researcher digital twins. Compute S-Index impact metrics combining citation data from Semantic Scholar and Google Scholar with code and dataset quality scores from GitHub and Figshare.
README
ResearchTwin: Federated Agentic Web of Research Knowledge
ResearchTwin is an open-source, federated platform that transforms a researcher's publications, datasets, and code repositories into a conversational Digital Twin. Built on a Bimodal Glial-Neural Optimization (BGNO) architecture, it enables dual-discovery where both humans and AI agents collaborate to accelerate scientific discovery.
Live at researchtwin.net | Join the Network
Project Vision
The exponential growth of scientific outputs has created a "discovery bottleneck." Traditional static PDFs and siloed repositories limit knowledge synthesis and reuse. ResearchTwin addresses this by:
- Integrating multi-modal research artifacts from Semantic Scholar, Google Scholar, GitHub, and Figshare
- Computing a real-time S-Index metric (Quality × Impact × Collaboration) across all output types
- Providing a conversational chatbot interface for interactive research exploration
- Exposing an Inter-Agentic Discovery API with Schema.org types for machine-to-machine research discovery
- Enabling a federated, Discord-like architecture supporting local nodes, hubs, and hosted edges
Architecture Overview
BGNO (Bimodal Glial-Neural Optimization)
Data Sources Glial Layer Neural Layer Interface
┌──────────────┐ ┌─────────────┐ ┌──────────────┐ ┌────────────┐
│Semantic Scholar│───▶│ │ │ │ │ Web Chat │
│Google Scholar │───▶│ SQLite │───▶│ RAG with │───▶│ Discord │
│GitHub API │───▶│ Cache + │ │ Claude API │ │ Agent API │
│Figshare API │───▶│ Rate Limit │ │ │ │ Embed │
└──────────────┘ └─────────────┘ └──────────────┘ └────────────┘
- Connector Layer: Pulls papers (S2+GS with deduplication), repos (GitHub), datasets (Figshare), and ORCID metadata
- Glial Layer: SQLite caching with 24h TTL, rate limiting, S2+GS title-similarity merge (0.85 threshold)
- Neural Layer: RAG with Claude — context assembly, prompt engineering, conversational synthesis
- Interface Layer: D3.js knowledge graph, chat widget, Discord bot, REST API
Federated Network Tiers
| Tier | Name | Description | Status |
|---|---|---|---|
| Tier 1 | Local Nodes | Researchers run python run_node.py locally |
Live |
| Tier 2 | Hubs | Lab aggregators federating multiple nodes | Planned |
| Tier 3 | Hosted Edges | Cloud-hosted at researchtwin.net | Live |
Inter-Agentic Discovery API
Machine-readable endpoints with Schema.org @type annotations:
| Endpoint | Schema.org Type | Purpose |
|---|---|---|
GET /api/researcher/{slug}/profile |
Person |
Researcher profile with HATEOAS links |
GET /api/researcher/{slug}/papers |
ItemList of ScholarlyArticle |
Papers with citations |
GET /api/researcher/{slug}/datasets |
ItemList of Dataset |
Datasets with QIC scores |
GET /api/researcher/{slug}/repos |
ItemList of SoftwareSourceCode |
Repos with QIC scores |
GET /api/discover?q=keyword&type=paper |
SearchResultSet |
Cross-researcher search |
Getting Started
Hosted (Tier 3) — Zero Setup
- Visit researchtwin.net/join.html
- Register with your name, email, and research identifiers
- Your Digital Twin is live immediately
Local Node (Tier 1) — Full Control
git clone https://github.com/martinfrasch/researchtwin.git
cd researchtwin
pip install -r backend/requirements.txt
cp node_config.json.example node_config.json
# Edit node_config.json with your details
python run_node.py --config node_config.json
Docker Deployment
cp .env.example .env # Add your API keys
docker-compose up -d --build
Required API keys: ANTHROPIC_API_KEY (for Claude RAG)
Optional: S2_API_KEY, GITHUB_TOKEN, DISCORD_BOT_TOKEN, SMTP credentials
Repository Structure
researchtwin/
├── backend/
│ ├── main.py # FastAPI endpoints (REST + Discovery API)
│ ├── researchers.py # SQLite researcher CRUD + token management
│ ├── database.py # SQLite schema, WAL mode, migrations
│ ├── models.py # Pydantic models for all endpoints
│ ├── rag.py # RAG context assembly for Claude
│ ├── qic_index.py # S-Index / QIC computation engine
│ ├── email_service.py # SMTP service for profile update codes
│ ├── connectors/ # Data source connectors
│ │ ├── semantic_scholar.py
│ │ ├── scholarly_lib.py # Google Scholar via scholarly
│ │ ├── github_connector.py
│ │ └── figshare.py
│ └── discord_bot/ # Discord bot with /research and /sindex
├── frontend/
│ ├── index.html # Main dashboard with D3.js knowledge graph
│ ├── join.html # Self-registration page
│ ├── update.html # Email-verified profile updates
│ ├── privacy.html # Privacy policy
│ └── widget-loader.js # Embeddable chat widget
├── run_node.py # Tier 1 local node launcher
├── node_config.json.example # Local node configuration template
├── docker-compose.yml # Docker orchestration
├── nginx/ # Nginx reverse proxy + SSL
└── whitepaper.tex # LaTeX manuscript
Ecosystem
This repository is part of the ResearchTwin Ecosystem project:
| Repository | Description |
|---|---|
| researchtwin | Federated platform (this repo) |
| s-index | S-Index formal specification and reference implementation |
Embeddable S-Index Widget
Show your S-Index on your lab website, Google Sites page, or personal homepage:
<iframe
src="https://researchtwin.net/embed.html?slug=YOUR-SLUG"
width="440" height="180"
style="border:none; border-radius:12px;"
loading="lazy">
</iframe>
Replace YOUR-SLUG with your researcher slug (e.g. martin-frasch).
Google Sites: Edit page > Insert > Embed > "By URL" tab > paste https://researchtwin.net/embed.html?slug=YOUR-SLUG
WordPress: Add a Custom HTML block and paste the iframe code.
The widget displays the researcher's name, S-Index score, h-index, citation count, and paper count. Data updates automatically from live API sources.
See it in action | Full embed instructions
Documentation
| Document | Description |
|---|---|
| API Reference | Full REST API documentation with schemas and examples |
| Self-Hosting Guide | Tier 1 Local Node setup and configuration |
| Hub Federation Guide | Tier 2 Hub architecture and setup (planned) |
| Security Policy | Vulnerability reporting and security best practices |
Contributing
Contributions welcome! See the project board for tracked issues.
- New connectors (ORCID enrichment, PubMed, OpenAlex)
- Affiliation-based geographic mapping
- MCP server for inter-agentic discovery
- UI/UX improvements
- Bug fixes and optimizations
License
MIT License. See LICENSE.
Contact
- Platform: researchtwin.net
- Email: martin@researchtwin.net
- Issues: GitHub Issues
Empowering researchers and AI agents to discover, collaborate, and innovate together.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.