Never Ask Twice
Enterprise support memory agent that remembers customer context across sessions, retrieves relevant memories, and forgets stale facts to reduce re-ask rates.
README
Never Ask Twice
Enterprise Support MemoryAgent on Qwen Cloud
Never Ask Twice is a production-shaped B2B support memory agent that remembers customer context across sessions, retrieves only the memories that matter, forgets stale facts safely, and proves improvement with a deterministic memory ON/OFF evaluation harness plus a live Qwen-backed API path.
The demo agent is Nat. Nat is powered by NATE — the Never Ask Twice Engine — a scoped memory layer that turns support conversations into durable, auditable customer context.
Built for the Qwen Cloud Global AI Hackathon — Track: MemoryAgent.
Status
| Area | Status | Notes |
|---|---|---|
| Public clean-room repo | Done | Synthetic data only; boundary scan included. |
| Local Postgres + pgvector setup | Done | docker compose up -d binds Postgres on localhost:5433. |
| Deterministic eval harness | Done | pnpm eval prints memory ON/OFF re-ask, recall, and hallucination metrics. |
| Memory service | Done | Working, episodic, semantic, forgetting, and budgeted recall paths are implemented. |
| MCP stdio surface | Done | Four memory tools are exposed through pnpm mcp:list-tools. |
| Qwen-backed live path | In progress | Requires DASHSCOPE_API_KEY; local-safe mode runs without it. |
| Alibaba Function Compute deployment | In progress | s.yaml and deployment instructions exist; final live proof is still required. |
| Demo video | Pending | Should use the frozen Acme scenario and the eval output line. |
Judge path
- Read the memory model:
docs/memory-model.md. - Run the ablation:
pnpm eval - Inspect the forgetting behavior:
docs/forgetting-policy.md. - Read the system architecture:
docs/architecture.md. - List MCP tools:
pnpm build pnpm mcp:list-tools - Review the deployment instructions and proof placeholder:
deploy/alibaba-fc.md.
The measurable result
Run:
pnpm eval
Expected deterministic fixture output:
memory-on re-ask rate: 0.00
memory-on recall accuracy: 1.00
memory-on hallucination count: 0
memory-off re-ask rate: 1.00
memory-off recall accuracy: 0.00
memory-off hallucination count: 0
re-ask rate: 0.00 (memory) vs 1.00 (no-memory)
The evaluation path is intentionally deterministic for reproducible scoring. It uses fixed synthetic fixtures and a fake Qwen client. The live API path uses Qwen Cloud when DASHSCOPE_API_KEY is configured.
What makes it a MemoryAgent
Never Ask Twice implements explicit memory tiers:
- Working memory — current-session context usable before session-close distillation.
- Episodic memory — raw support events with Qwen embeddings and provenance.
- Semantic memory — distilled customer facts with confidence, validity windows, and source links.
- Forgetting policy — TTL expiry, supersession, stale-memory exclusion, and audit-safe provenance.
- Budgeted recall — relevant memories only, capped to a strict context budget.
- MCP surface — memory tools exposed for agent interoperability.
This is not transcript logging. It is structured memory with retrieval discipline, provenance, forgetting, and measurable cross-session improvement.
Architecture
Customer chat / MCP
|
v
Hono API on Alibaba Function Compute
|
v
MemoryService
|-- working memory: current-session facts
|-- episodic memory: turn events + Qwen embeddings
|-- semantic memory: distilled durable facts
|-- forgetting: TTL + supersession + scoped recall
|
+--> Qwen Cloud via DashScope-compatible OpenAI API
+--> Postgres + pgvector
+--> MCP stdio tools
More detail: docs/architecture.md.
Getting started
1. Clone and install
git clone https://github.com/marcelle-labs/never-ask-twice.git
cd never-ask-twice
pnpm install
2. Configure environment
cp .env.example .env
Edit .env and set your DASHSCOPE_API_KEY from DashScope for live Qwen-backed embeddings, distillation, and adjudication. The example is pre-filled for local Postgres on port 5433.
DATABASE_URL=postgresql://neverasktwice:neverasktwice@localhost:5433/neverasktwice
DASHSCOPE_API_KEY=your-key-here
QWEN_BASE_URL=https://dashscope-intl.aliyuncs.com/compatible-mode/v1
QWEN_CHAT_MODEL=qwen-plus
QWEN_EMBEDDING_MODEL=text-embedding-v3
QWEN_EMBEDDING_DIM=1024
MEMORY_TOKEN_BUDGET=1200
Without DASHSCOPE_API_KEY, the API boots in local-safe mode. Local-safe mode uses zero-vector embeddings and empty distillation responses so the server can run without secrets; it does not perform real Qwen work. Use pnpm eval for deterministic local scoring without a key.
3. Start Postgres
docker compose up -d
The local database binds to localhost:5433 so it does not collide with other Postgres services on 5432.
4. Run migrations
pnpm migrate
5. Run the eval harness
pnpm eval
6. Run the boundary scan
pnpm boundary-scan
7. Start the local API
pnpm dev
The API will be available at http://localhost:3000 with endpoints:
GET /health— health and capability status.POST /turn— append a customer/agent turn.POST /sessions/:id/close— close a session and distill episodic memory into semantic memory.POST /recall— recall a bounded memory bundle.
8. Run the MCP server
pnpm build
node dist/src/mcp/server.js
The MCP server exposes four tools: recall_memory, write_memory, distill_session, and forget.
Project structure
apps/api— Hono API, local server, and Function Compute handler.src/agent— deterministic support-agent policy used by the eval harness.src/contracts.ts— memory predicate enum, Zod contracts, and shared types.src/db— Drizzle schema and SQL migration string.src/memory— memory service, stores, retrieval, supersession, and forgetting behavior.src/mcp— stdio MCP surface over the shared memory service.src/qwen— single Qwen Cloud client module.src/testing— deterministic fake Qwen client for the eval harness.eval— frozen three-session scenario, ground truth, expected output, and runner.scripts— boundary scan, migration, MCP list-tools, and demo script checks.docs— judge-facing architecture, memory model, evaluation, and forgetting documentation.deploy— Alibaba Function Compute deployment instructions and proof placeholder.
Key commands
| Command | Purpose |
|---|---|
pnpm install |
Install dependencies |
pnpm build |
Build the project |
pnpm lint |
Run TypeScript type check |
pnpm test |
Run the test suite |
pnpm eval |
Run the deterministic memory ON/OFF eval harness |
pnpm migrate |
Run database migrations |
pnpm boundary-scan |
Run the clean-room boundary scan |
pnpm mcp:list-tools |
List the MCP tools |
pnpm demo:script-check |
Verify demo fixtures are aligned |
Security and clean-room boundary
Never Ask Twice uses synthetic data only. Do not commit real customer data, secrets, .env files, or private platform identifiers. The repository includes a boundary scan to fail on known forbidden tokens and a local-safe mode so judges can run the server without secrets.
See SECURITY.md.
License
Apache-2.0
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.