Making Sense of Evaluation Results

Transform raw evaluation scores into actionable insights

Now that you're collecting evaluation data, let's explore how to make the most of it and continuously improve your AI agents.

Understanding Your Evaluation Data with Insights

You're now receiving evaluation scores for your agent outputs—but what does a score of 0.3 or 0.7 actually mean? Is that good? Should you be concerned? When you're managing even a a relatively small number of evaluations, raw numeric scores can be overwhelming and hard to act on.

Introducing Insights

Insights transforms your evaluation data into actionable intelligence. Instead of sifting through tables of numbers trying to identify patterns, you get a Sentry-like issues feed that automatically analyzes your results and tells you exactly what needs attention.

How Insights Works

Insights continuously monitors your evaluation results and:

  • Interprets scores in plain language - No more guessing what 0.3 means. Insights tells you "Your agent is consistently failing to follow the refund policy in 23% of interactions."

  • Surfaces patterns and anomalies - Automatically detects when specific evaluators are underperforming, when scores are trending downward, or when certain tags or user segments are experiencing issues.

  • Prioritizes what matters - Not all low scores are equal. Insights helps you focus on the issues that have the biggest impact on your users.

  • Provides actionable recommendations - Get specific guidance on how to improve: "Consider adding more examples about shipping timelines to your agent's context" or "The tone evaluator shows issues primarily in refund scenarios—review your refund handling logic."

Think of Insights as Your Evaluation Expert

Just like Sentry helps you catch and fix errors in your code, Insights helps you catch and fix quality issues in your AI agents. It's like having an evaluation expert constantly monitoring production, flagging problems, and telling you exactly what to fix.

Accessing Insights

Head over to the Monitoring & Insights view to explore your evaluation data and start acting on recommendations.

Last updated