Judges
Judges are stacks of Evaluators with their own high-level intent.
You can see the overview of your Judges in the app:

You can inspect a Judge in detail as well:

Via OpenAI-compatible Endpoint
cURL
Python
Execution Metadata
Similar to evaluators, you can pass metadata to judge executions to improve traceability and evaluation context.
user_id: Identify which end-user triggered the evaluation.session_id: Group evaluations by conversation session.system_prompt: Provide the original system context to the judge.tags: Free form tags for more poweful filtering and more actionable insights.
Example (Python SDK):
Multi-Turn Conversations
Judges can also evaluate multi-turn conversations to assess agent behavior across an entire interaction. You can provide message history containing the full interaction, including tool calls.
Example (Python SDK):
Last updated