Opik
Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards. - comet-ml/opik
comet-ml
README
<h1 align="center" style="border-bottom: none"> <div> <a href="https://www.comet.com/site/products/opik/?from=llm&utm_source=opik&utm_medium=github&utm_content=header_img&utm_campaign=opik"><picture> <source media="(prefers-color-scheme: dark)" srcset="/apps/opik-documentation/documentation/static/img/logo-dark-mode.svg"> <source media="(prefers-color-scheme: light)" srcset="/apps/opik-documentation/documentation/static/img/opik-logo.svg"> <img alt="Comet Opik logo" src="/apps/opik-documentation/documentation/static/img/opik-logo.svg" width="200" /> </picture></a> <br> Opik </div> Open source LLM evaluation framework<br> </h1>
<p align="center"> From RAG chatbots to code assistants to complex agentic pipelines and beyond, build LLM systems that run better, faster, and cheaper with tracing, evaluations, and dashboards. </p>
<div align="center">
<a target="_blank" href="https://colab.research.google.com/github/comet-ml/opik/blob/master/apps/opik-documentation/documentation/docs/cookbook/opik_quickstart.ipynb">
<!-- <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open Quickstart In Colab"/> --> </a>
</div>
<p align="center"> <a href="https://www.comet.com/site/products/opik/?from=llm&utm_source=opik&utm_medium=github&utm_content=website_button&utm_campaign=opik"><b>Website</b></a> • <a href="https://chat.comet.com"><b>Slack community</b></a> • <a href="https://x.com/Cometml"><b>Twitter</b></a> • <a href="https://www.comet.com/docs/opik/?from=llm&utm_source=opik&utm_medium=github&utm_content=docs_button&utm_campaign=opik"><b>Documentation</b></a> </p>
🚀 What is Opik?
Opik is an open-source platform for evaluating, testing and monitoring LLM applications. Built by Comet.
<br>
You can use Opik for:
-
Development:
-
Tracing: Track all LLM calls and traces during development and production (Quickstart, Integrations
-
Annotations: Annotate your LLM calls by logging feedback scores using the Python SDK or the UI.
-
Playground:: Try out different prompts and models in the prompt playground
-
-
Evaluation: Automate the evaluation process of your LLM application:
-
Datasets and Experiments: Store test cases and run experiments (Datasets, Evaluate your LLM Application)
-
LLM as a judge metrics: Use Opik's LLM as a judge metric for complex issues like hallucination detection, moderation and RAG evaluation (Answer Relevance, Context Precision
-
CI/CD integration: Run evaluations as part of your CI/CD pipeline using our PyTest integration
-
-
Production Monitoring:
-
Log all your production traces: Opik has been designed to support high volumes of traces, making it easy to monitor your production applications. Even small deployments can ingest more than 40 million traces per day!
-
Monitoring dashboards: Review your feedback scores, trace count and tokens over time in the Opik Dashboard.
-
Online evaluation metrics: Easily score all your production traces using LLM as a Judge metrics and identify any issues with your production LLM application thanks to Opik's online evaluation metrics
-
[!TIP]
If you are looking for features that Opik doesn't have today, please raise a new Feature request 🚀
<br>
🛠️ Installation
Opik is available as a fully open source local installation or using Comet.com as a hosted solution. The easiest way to get started with Opik is by creating a free Comet account at comet.com.
If you'd like to self-host Opik, you can do so by cloning the repository and starting the platform using Docker Compose:
# Clone the Opik repository
git clone https://github.com/comet-ml/opik.git
# Navigate to the opik/deployment/docker-compose directory
cd opik/deployment/docker-compose
# Optionally, you can force a pull of the latest images
docker compose pull
# Start the Opik platform
docker compose up --detach
# You can now visit http://localhost:5173 on your browser!
For more information about the different deployment options, please see our deployment guides:
Installation methods | Docs link |
---|---|
Local instance | |
Kubernetes |
🏁 Get Started
To get started, you will need to first install the Python SDK:
pip install opik
Once the SDK is installed, you can configure it by running the opik configure
command:
opik configure
This will allow you to configure Opik locally by setting the correct local server address or if you're using the Cloud platform by setting the API Key
[!TIP]
You can also call theopik.configure(use_local=True)
method from your Python code to configure the SDK to run on the local installation.
You are now ready to start logging traces using the Python SDK.
📝 Logging Traces
The easiest way to get started is to use one of our integrations. Opik supports:
Integration | Description | Documentation | Try in Colab |
---|---|---|---|
OpenAI | Log traces for all OpenAI LLM calls | Documentation | |
LiteLLM | Call any LLM model using the OpenAI format | Documentation | |
LangChain | Log traces for all LangChain LLM calls | Documentation | |
Haystack | Log traces for all Haystack calls | Documentation | |
Anthropic | Log traces for all Anthropic LLM calls | Documentation | |
Bedrock | Log traces for all Bedrock LLM calls | Documentation | |
CrewAI | Log traces for all CrewAI calls | Documentation | |
DeepSeek | Log traces for all DeepSeek LLM calls | Documentation | |
DSPy | Log traces for all DSPy runs | Documentation | |
Gemini | Log traces for all Gemini LLM calls | Documentation | |
Groq | Log traces for all Groq LLM calls | Documentation | |
Guardrails | Log traces for all Guardrails validations | Documentation | |
Instructor | Log traces for all LLM calls made with Instructor | Documentation | |
LangGraph | Log traces for all LangGraph executions | Documentation | |
LlamaIndex | Log traces for all LlamaIndex LLM calls | Documentation | |
Ollama | Log traces for all Ollama LLM calls | Documentation | |
Predibase | Fine-tune and serve open-source Large Language Models | Documentation | |
Ragas | Evaluation framework for your Retrieval Augmented Generation (RAG) pipelines | Documentation | |
watsonx | Log traces for all watsonx LLM calls | Documentation |
[!TIP]
If the framework you are using is not listed above, feel free to open an issue or submit a PR with the integration.
If you are not using any of the frameworks above, you can also use the track
function decorator to log traces:
import opik
opik.configure(use_local=True) # Run locally
@opik.track
def my_llm_function(user_question: str) -> str:
# Your LLM code here
return "Hello"
[!TIP]
The track decorator can be used in conjunction with any of our integrations and can also be used to track nested function calls.
🧑⚖️ LLM as a Judge metrics
The Python Opik SDK includes a number of LLM as a judge metrics to help you evaluate your LLM application. Learn more about it in the metrics documentation.
To use them, simply import the relevant metric and use the score
function:
from opik.evaluation.metrics import Hallucination
metric = Hallucination()
score = metric.score(
input="What is the capital of France?",
output="Paris",
context=["France is a country in Europe."]
)
print(score)
Opik also includes a number of pre-built heuristic metrics as well as the ability to create your own. Learn more about it in the metrics documentation.
🔍 Evaluating your LLM Application
Opik allows you to evaluate your LLM application during development through Datasets and Experiments.
You can also run evaluations as part of your CI/CD pipeline using our PyTest integration.
⭐ Star Us on GitHub
If you find Opik useful, please consider giving us a star! Your support helps us grow our community and continue improving the product.
<img src="https://github.com/user-attachments/assets/ffc208bb-3dc0-40d8-9a20-8513b5e4a59d" alt="Opik GitHub Star History" width="600"/>
🤝 Contributing
There are many ways to contribute to Opik:
- Submit bug reports and feature requests
- Review the documentation and submit Pull Requests to improve it
- Speaking or writing about Opik and letting us know
- Upvoting popular feature requests to show your support
To learn more about how to contribute to Opik, please see our contributing guidelines.
Recommended Servers

VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
AIO-MCP Server
🚀 All-in-one MCP server with AI search, RAG, and multi-service integrations (GitLab/Jira/Confluence/YouTube) for AI-enhanced development workflows. Folk from
Hyperbrowser MCP Server
Welcome to Hyperbrowser, the Internet for AI. Hyperbrowser is the next-generation platform empowering AI agents and enabling effortless, scalable browser automation. Built specifically for AI developers, it eliminates the headaches of local infrastructure and performance bottlenecks, allowing you to
BigQuery
This is a server that lets your LLMs (like Claude) talk directly to your BigQuery data! Think of it as a friendly translator that sits between your AI assistant and your database, making sure they can chat securely and efficiently.
Perplexity Chat MCP Server
MCP Server for the Perplexity API.
Web Research Server
A Model Context Protocol server that enables Claude to perform web research by integrating Google search, extracting webpage content, and capturing screenshots.
MySQL Server
Allows AI assistants to list tables, read data, and execute SQL queries through a controlled interface, making database exploration and analysis safer and more structured.
Aindreyway Codex Keeper
Serves as a guardian of development knowledge, providing AI assistants with curated access to latest documentation and best practices.
Etherscan Tools
Facilitates interaction with Ethereum blockchain data via Etherscan's API, providing real-time access to balances, transactions, token transfers, contract ABIs, gas prices, and ENS name resolutions.
Perplexity Deep Research
A server that allows AI assistants to perform web searches using Perplexity's sonar-deep-research model with citation support.