MCP Servers

Never Ask Twice

Enterprise support memory agent that remembers customer context across sessions, retrieves relevant memories, and forgets stale facts to reduce re-ask rates.

README

Never Ask Twice

Enterprise Support MemoryAgent on Qwen Cloud

Never Ask Twice is a production-shaped B2B support memory agent that remembers customer context across sessions, retrieves only the memories that matter, forgets stale facts safely, and proves improvement with a deterministic memory ON/OFF evaluation harness plus a live Qwen-backed API path.

The demo agent is Nat. Nat is powered by NATE — the Never Ask Twice Engine — a scoped memory layer that turns support conversations into durable, auditable customer context.

Built for the Qwen Cloud Global AI Hackathon — Track: MemoryAgent.

Status

Area	Status	Notes
Public clean-room repo	Done	Synthetic data only; boundary scan included.
Local Postgres + pgvector setup	Done	`docker compose up -d` binds Postgres on `localhost:5433`.
Deterministic eval harness	Done	`pnpm eval` prints memory ON/OFF re-ask, recall, and hallucination metrics.
Memory service	Done	Working, episodic, semantic, forgetting, and budgeted recall paths are implemented.
MCP stdio surface	Done	Four memory tools are exposed through `pnpm mcp:list-tools`.
Qwen-backed live path	In progress	Requires `DASHSCOPE_API_KEY`; local-safe mode runs without it.
Alibaba Function Compute deployment	In progress	`s.yaml` and deployment instructions exist; final live proof is still required.
Demo video	Pending	Should use the frozen Acme scenario and the eval output line.

Judge path

Read the memory model: docs/memory-model.md.
Run the ablation:
```
pnpm eval
```
Inspect the forgetting behavior: docs/forgetting-policy.md.
Read the system architecture: docs/architecture.md.
List MCP tools:
```
pnpm build
pnpm mcp:list-tools
```
Review the deployment instructions and proof placeholder: deploy/alibaba-fc.md.

The measurable result

Run:

pnpm eval

Expected deterministic fixture output:

memory-on re-ask rate: 0.00
memory-on recall accuracy: 1.00
memory-on hallucination count: 0
memory-off re-ask rate: 1.00
memory-off recall accuracy: 0.00
memory-off hallucination count: 0
re-ask rate: 0.00 (memory) vs 1.00 (no-memory)

The evaluation path is intentionally deterministic for reproducible scoring. It uses fixed synthetic fixtures and a fake Qwen client. The live API path uses Qwen Cloud when DASHSCOPE_API_KEY is configured.

What makes it a MemoryAgent

Never Ask Twice implements explicit memory tiers:

Working memory — current-session context usable before session-close distillation.
Episodic memory — raw support events with Qwen embeddings and provenance.
Semantic memory — distilled customer facts with confidence, validity windows, and source links.
Forgetting policy — TTL expiry, supersession, stale-memory exclusion, and audit-safe provenance.
Budgeted recall — relevant memories only, capped to a strict context budget.
MCP surface — memory tools exposed for agent interoperability.

This is not transcript logging. It is structured memory with retrieval discipline, provenance, forgetting, and measurable cross-session improvement.

Architecture

Customer chat / MCP
        |
        v
Hono API on Alibaba Function Compute
        |
        v
MemoryService
  |-- working memory: current-session facts
  |-- episodic memory: turn events + Qwen embeddings
  |-- semantic memory: distilled durable facts
  |-- forgetting: TTL + supersession + scoped recall
        |
        +--> Qwen Cloud via DashScope-compatible OpenAI API
        +--> Postgres + pgvector
        +--> MCP stdio tools

More detail: docs/architecture.md.

Getting started

1. Clone and install

git clone https://github.com/marcelle-labs/never-ask-twice.git
cd never-ask-twice
pnpm install

2. Configure environment

cp .env.example .env

Edit .env and set your DASHSCOPE_API_KEY from DashScope for live Qwen-backed embeddings, distillation, and adjudication. The example is pre-filled for local Postgres on port 5433.

DATABASE_URL=postgresql://neverasktwice:neverasktwice@localhost:5433/neverasktwice
DASHSCOPE_API_KEY=your-key-here
QWEN_BASE_URL=https://dashscope-intl.aliyuncs.com/compatible-mode/v1
QWEN_CHAT_MODEL=qwen-plus
QWEN_EMBEDDING_MODEL=text-embedding-v3
QWEN_EMBEDDING_DIM=1024
MEMORY_TOKEN_BUDGET=1200

Without DASHSCOPE_API_KEY, the API boots in local-safe mode. Local-safe mode uses zero-vector embeddings and empty distillation responses so the server can run without secrets; it does not perform real Qwen work. Use pnpm eval for deterministic local scoring without a key.

3. Start Postgres

docker compose up -d

The local database binds to localhost:5433 so it does not collide with other Postgres services on 5432.

4. Run migrations

pnpm migrate

5. Run the eval harness

pnpm eval

6. Run the boundary scan

pnpm boundary-scan

7. Start the local API

pnpm dev

The API will be available at http://localhost:3000 with endpoints:

GET /health — health and capability status.
POST /turn — append a customer/agent turn.
POST /sessions/:id/close — close a session and distill episodic memory into semantic memory.
POST /recall — recall a bounded memory bundle.

8. Run the MCP server

pnpm build
node dist/src/mcp/server.js

The MCP server exposes four tools: recall_memory, write_memory, distill_session, and forget.

Project structure

apps/api — Hono API, local server, and Function Compute handler.
src/agent — deterministic support-agent policy used by the eval harness.
src/contracts.ts — memory predicate enum, Zod contracts, and shared types.
src/db — Drizzle schema and SQL migration string.
src/memory — memory service, stores, retrieval, supersession, and forgetting behavior.
src/mcp — stdio MCP surface over the shared memory service.
src/qwen — single Qwen Cloud client module.
src/testing — deterministic fake Qwen client for the eval harness.
eval — frozen three-session scenario, ground truth, expected output, and runner.
scripts — boundary scan, migration, MCP list-tools, and demo script checks.
docs — judge-facing architecture, memory model, evaluation, and forgetting documentation.
deploy — Alibaba Function Compute deployment instructions and proof placeholder.

Key commands

Command	Purpose
`pnpm install`	Install dependencies
`pnpm build`	Build the project
`pnpm lint`	Run TypeScript type check
`pnpm test`	Run the test suite
`pnpm eval`	Run the deterministic memory ON/OFF eval harness
`pnpm migrate`	Run database migrations
`pnpm boundary-scan`	Run the clean-room boundary scan
`pnpm mcp:list-tools`	List the MCP tools
`pnpm demo:script-check`	Verify demo fixtures are aligned

Security and clean-room boundary

Never Ask Twice uses synthetic data only. Do not commit real customer data, secrets, .env files, or private platform identifiers. The repository includes a boundary scan to fail on known forbidden tokens and a local-safe mode so judges can run the server without secrets.

See SECURITY.md.

License

Apache-2.0

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured