Praxis

Praxis

Gives AI assistants a reliable memory for proven, verified skills, enabling them to reuse successful solutions and avoid repeating mistakes.

Category
Visit Server

README

Praxis

Give your AI assistant a memory for the things it has proven it can do.

AI assistants are forgetful. Every new conversation, your assistant starts from scratch - it doesn't remember the tricky thing it figured out yesterday, so it works it out all over again (and sometimes gets it wrong all over again).

Praxis fixes that, with a twist most "AI memory" tools miss: it only saves a skill after the AI has actually run it and shown it works. No guessing, no "I'm pretty sure this is right." If it doesn't pass the check, it doesn't get saved.

Think of the difference between:

  • A notebook of ideas (what normal AI memory does): "I think the way to do this is..." - might be wrong.
  • A box of tested recipes (what Praxis does): "Here's exactly how to do this - I've made it before, it works."

Memory stores what you saw. Praxis stores what you proved.

What it does, in plain terms

  • Remembers real skills, not guesses. When your AI solves a task, Praxis keeps it only if it passes a real, automatic check. Failed attempts never get saved as "knowledge."
  • Remembers mistakes too. It keeps a short list of things that didn't work, so your AI sees "you already tried this and it failed" before it wastes time repeating it.
  • Stays small and fast. It doesn't pile up forever. Near-duplicates get merged, and skills that never get used get cleaned out automatically - so it won't get slow or expensive as it grows.
  • Reuses and combines skills. Once a skill is proven, your AI can use it again instantly, or snap several proven skills together to do something bigger.
  • Works with the AI tools you already use - Claude Code, Cursor, or your own agent. Anything that speaks MCP (a common standard for plugging tools into AI assistants).

Why you'd want it

  • Your AI gets more dependable over time on your actual work, instead of resetting to zero every session.
  • It stops repeating the same mistakes.
  • It won't balloon into something slow or costly - it tidies up after itself.
  • It's honest: a saved skill is something that genuinely ran and passed, not something the AI merely felt confident about.

One honest caveat: Praxis makes your assistant more reliable, not smarter. The AI still does the thinking - Praxis is the part that double-checks the work and remembers only the wins.


For developers

Praxis is a small, dependency-light MCP server. Core and tests use Node 24 built-ins only (node:sqlite, node:test, native TypeScript type-stripping) - no build step. The only runtime dependency is the MCP SDK.

Install

Requires Node 24+.

npm i -g @northtek/praxis
praxis init

praxis init runs a self-test and prints the stanza to add to your .mcp.json:

{ "mcpServers": { "praxis": { "command": "praxis", "args": ["serve"] } } }

Restart your agent. Done.

How it works

solve a task -> distill a Skill {interface, implementation, acceptanceTest}
            -> VERIFY in a sandbox (pass => kept, fail/timeout/async => quarantined)
            -> dedup/merge on write
            -> recall top-k within a token budget (+ known failures)
            -> compose verified skills by reference
            -> score by utility, evict/consolidate to stay lean

The agent stays the brain; Praxis is the part that only keeps what's proven.

Tools (MCP)

remember_skill · recall_skills · run_skill · record_failure · reinforce · library_stats · pin_skill · consolidate_now

recall_skills returns verified skills and relevant negative skills ("known failure modes") so the agent sees the wall it hit last time before it retries.

What keeps it from bloating / getting expensive

The library is self-pruning, not append-only:

  1. Verify gate at entry - failed attempts never become skills.
  2. Dedup + merge on write - a near-duplicate reinforces the existing skill instead of adding one.
  3. Utility-weighted tiering with a bounded hot set; warm/cold skills are excluded from recall but stay callable by id (and can be promoted back by consolidation).
  4. Budgeted top-k retrieval - context cost is O(k) tokens, bounded by tokenBudget, independent of library size. (Retrieval compute is O(hot-set size), bounded by the hot-set cap - not O(1).)
  5. Consolidation pass - regression-safe dedup-merge + eviction.

Trust

The verify gate is fail-closed: a skill reaches verified only if its acceptance test executed and passed. The sandbox runs in a worker thread with a hard memory cap and a timeout kill, and defends against try/catch assert-swallowing, async vacuous-passes, weak self-referential tests, and tests that try to detect or tamper with the checker. It is isolated with a memory cap and timeout kill - not a hardened multi-tenant security boundary (run only your own agent's code in v1; hosted/untrusted-code use needs isolated-vm or a subprocess sandbox).

Benchmark

See bench/. The benchmark is synthetic (an author-designed task stream, HashingEmbedder); it is an existence proof that the system behaves as designed - capability reuse, sublinear growth on repeated work (the long tail grows linearly, and is shown), bounded per-task cost, and a measured repeat-error reduction with negatives on. Not a general-performance claim.

Honest scope (what Praxis does NOT claim)

  • Not "your agent becomes smarter" - it accumulates verified expertise on your workflows. The LLM proposes; Praxis verifies and keeps.
  • Not "the first verified skill library" - prior art (Voyager, SkillGen, PreAct) verifies too. The specific unclaimed combination: a domain-agnostic, sandboxed, fail-closed acceptance gate + first-class negative skills + budgeted O(k)-context retrieval, exposed via MCP.
  • Composed skills carry their own acceptance tests and cascade-quarantine when a sub-skill is invalidated; deep arbitrary-graph reliability is not guaranteed in v1.
  • Library growth is architected for sublinearity via dedup/merge/eviction; it is not a guarantee for all workloads.

Prior art

Voyager (2305.16291), Reflexion (2303.11366), SkillGen (2408.08435), PreAct (2606.17929), Generative Agents (2304.03442), SoK: Agentic Skills (2602.20867). Praxis builds on the verify-before-keep idea and adds first-class negative skills + budgeted retrieval + an MCP surface.

License

MIT - see LICENSE.

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured