> For the complete documentation index, see [llms.txt](https://docs.scorable.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.scorable.ai/roadmap.md).

# Roadmap

[Scorable](https://scorable.ai/) builds with the philosophy of transparency with multiple [open source](https://github.com/root-signals) projects. This roadmap is a living document about what we are working on and what is next. Scorable is the automated LLM Evaluation Engineer agent for co-managing your evaluation stack.

## Vision

Our vision is to ***create and auto-optimize the strongest automated knowledge process evaluation stack possible, with the least amount of effort and information from the user.***

* **Maximum Automated Information Extraction**
  * From user intent and/or provided example/instruction data, extract as much relevant information as possible.
* **Awareness of the information quality**
  * Engage the user with the smallest amount of maximally impactful questions.
* **Maximally Powerful Evaluation Stack Generation**
  * Build the most comprehensive and accurate evaluation capabilities possible, within the confines of data available.
* **Built for Agents**
  * Maximum compatibility with autonomous agents and workflows.
* **Maximum Integration Surface**
  * Seamless integration with all key AI frameworks.
* **EvalOps Principles for Long Term**
  * Follow Root [EvalOps](https://www.scorable.ai/post/evalops) Principles for evaluator lifecycle management.
* [Principled Evaluator Infrastructure](https://docs.scorable.ai/overview/principles)

{% hint style="info" %}
**All feedback is highly appreciated and often leads to immediate action.** Submit new [GitHub issues](https://github.com/root-signals/rs-sdk/issues) or vote on existing ones, so we can take quick action on what is important to you.
{% endhint %}

## 🚧 In Progress

* Retiring legacy inline evaluator demonstrations and objective test sets in favor of [datasets and annotations](/concepts-and-examples/usage/datasets-and-annotations.md)
* Deeper coding-agent integrations building on the [Agent Skill](/skill.md) and `scorable skills-add`

## 🚀 Recently Shipped

* ✅ Datasets, annotations, and score configs as first-class resources across the UI, SDKs, and CLI ([link](/concepts-and-examples/usage/datasets-and-annotations.md))
* ✅ Calibration runs with agreement metrics and per-example disagreement analysis ([link](/concepts-and-examples/cookbooks/add-a-custom-evaluator/add-a-calibration-set.md))
* ✅ Ladder generation of synthetic calibration data spanning the full score range
* ✅ OTEL trace ingestion with filters that auto-evaluate matching production traces ([link](/concepts-and-examples/cookbooks/otel-evaluation-via-cli.md))
* ✅ Agent Skills for coding agents: SKILL.md onboarding and `scorable skills-add` ([link](/integrations/coding-agents.md))
* ✅ Prompt testing from the CLI ([link](/concepts-and-examples/usage/prompt-testing.md))
* ✅ Public judge sharing with an anonymous try-it page
* ✅ Judge refine endpoints that auto-improve responses failing your quality bar, OpenAI-compatible
* ✅ Issues: recurring failure patterns extracted from evaluation results ([link](/concepts-and-examples/usage/issues.md))
* ✅ Slack integration with daily insights and a conversational assistant ([link](/concepts-and-examples/usage/monitoring-and-insights.md))
* ✅ Organization-level content retention controls ([link](/concepts-and-examples/usage/execution-auditability-and-versioning.md))

## Earlier Highlights

* ✅ Agent Evaluation MCP: stdio & SSE versions ([link](https://github.com/root-signals/root-signals-mcp))
* ✅ TypeScript SDK and Command Line Interface
* ✅ Full OpenTelemetry (OTEL) exports ([link](https://docs.scorable.ai/integrations/opentelemetry))
* ✅ [Root Judge LLM](https://www.scorable.ai/root-judge-llm) 70B judge available for download and running in Scorable for free
* ✅ GDPR awareness of models ([link](https://docs.scorable.ai/usage/usage/models#control-and-compliance))
* ✅ Automated Policy Adherence Judges from uploaded policy documents and intents
* ✅ Evaluator version history, determinism benchmarks, and reference standard deviations ([link](https://docs.scorable.ai/usage/usage/evaluators#determinism))