Making Sense of Evaluation Results
Transform raw evaluation scores into actionable insights
Now that you're collecting evaluation data, let's explore how to make the most of it and continuously improve your AI agents.
Understanding Your Evaluation Data with Insights
You're now receiving evaluation scores for your agent outputs—but what does a score of 0.3 or 0.7 actually mean? Is that good? Should you be concerned? When you're managing even a a relatively small number of evaluations, raw numeric scores can be overwhelming and hard to act on.
Introducing Insights
Insights transforms your evaluation data into actionable intelligence. Instead of sifting through tables of numbers trying to identify patterns, you get a Sentry-like issues feed that automatically analyzes your results and tells you exactly what needs attention.

How Insights Works
Insights continuously monitors your evaluation results and:
Interprets scores in plain language - No more guessing what 0.3 means. Insights tells you "Your agent is consistently failing to follow the refund policy in 23% of interactions."
Surfaces patterns and anomalies - Automatically detects when specific evaluators are underperforming, when scores are trending downward, or when certain tags or user segments are experiencing issues.
Prioritizes what matters - Not all low scores are equal. Insights helps you focus on the issues that have the biggest impact on your users.
Provides actionable recommendations - Get specific guidance on how to improve: "Consider adding more examples about shipping timelines to your agent's context" or "The tone evaluator shows issues primarily in refund scenarios—review your refund handling logic."
Think of Insights as Your Evaluation Expert
Just like Sentry helps you catch and fix errors in your code, Insights helps you catch and fix quality issues in your AI agents. It's like having an evaluation expert constantly monitoring production, flagging problems, and telling you exactly what to fix.
Accessing Insights
Head over to the Monitoring & Insights view to explore your evaluation data and start acting on recommendations.
Last updated