# Judges

Judges are stacks of [Evaluators](https://docs.scorable.ai/usage/usage/evaluators) with their own high-level intent.

You can see the overview of your Judges in the app:

<figure><img src="https://1145415225-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FUoDNiw7ySSaFXXkaGCic%2Fuploads%2Fgit-blob-41c3ada3298fcb347d81336c044f88f1c7dfd115%2FScreenshot%202025-05-28%20at%2010.00.12.png?alt=media" alt=""><figcaption></figcaption></figure>

You can inspect a Judge in detail as well:

<figure><img src="https://1145415225-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FUoDNiw7ySSaFXXkaGCic%2Fuploads%2Fgit-blob-5cdf011313d956078273cca4a112b7db4e1107d2%2FScreenshot%202025-05-28%20at%209.58.51.png?alt=media" alt=""><figcaption></figcaption></figure>

**Via OpenAI-compatible Endpoint**

```python
# pip install openai
from openai import OpenAI


client = OpenAI(
    api_key="$MY_API_KEY",
    base_url="https://api.scorable.ai/v1/judges/$MY_JUDGE_ID/openai/"
)

response = client.chat.completions.create(
    model="gpt-5.2",
    messages=[
        {"role": "user", "content": "I want to return my product"}
    ]
)

print(f"Assistant's response: {response.choices[0].message.content}")
print(f"Judge evaluation results: {response.model_extra.get('evaluator_results')}")
```

**cURL**

```bash
curl 'https://api.scorable.ai/v1/judges/$MY_JUDGE_ID/execute/' \
-H 'authorization: Api-Key $MY_API_KEY' \
-H 'content-type: application/json' \
--data-raw '{"response":"LLM said: You can return the item within 30 days of purchase, and we will refund the full amount...","request":"I want to return my product"}'
```

**Python**

```python
# pip install scorable
from scorable import Scorable

client = Scorable(api_key="$MY_API_KEY")
result = client.judges.run(
    judge_id="$MY_JUDGE_ID",
    response="LLM said: You can return the item within 30 days of purchase, and we will refund the full amount...",
    request="I want to return my product"
)
print(f"Run results: {result.evaluator_results}")
# Score (a float between 0 and 1): {result.evaluator_results[0].score}
# Justification for the score: {result.evaluator_results[0].justification}
```

## Execution Metadata

Similar to evaluators, you can pass metadata to judge executions to improve traceability and evaluation context.

* **`user_id`**: Identify which end-user triggered the evaluation.
* **`session_id`**: Group evaluations by conversation session.
* **`system_prompt`**: Provide the original system context to the judge.
* **`tags`**: Free form tags for more powerful filtering and more actionable insights.

**Example (Python SDK):**

```python
result = client.judges.run(
    judge_id="$MY_JUDGE_ID",
    response="...",
    request="...",
    user_id="customer_678",
    session_id="chat_999",
    system_prompt="Help customers with returns.",
    tags=["qa-testing"]
)
```

## Multi-Turn Conversations

Judges can also evaluate multi-turn conversations to assess agent behavior across an entire interaction. You can provide message history containing the full interaction, including tool calls.

**Example (Python SDK):**

```python
from scorable import Scorable
from scorable.multiturn import Turn

client = Scorable(api_key="$MY_API_KEY")

# Create a multi-turn conversation
turns = [
    Turn(role="user", content="Hello, I need help with my order"),
    Turn(role="assistant", content="I'd be happy to help! What's your order number?"),
    Turn(role="user", content="It's ORDER-12345"),
    Turn(
        # Assistant turn can be a tool call which may not be directly visible to the user.
        role="assistant",
        content="{'order_number': 'ORDER-12345', 'status': 'shipped', 'eta': 'Jan 20'}",
        tool_name="order_lookup",
    ),
    Turn(
        role="assistant",
        content="I found your order. It's currently in transit.",
    ),
]

# Evaluate the multi-turn conversation with a judge
result = client.judges.run(
    judge_id="$MY_JUDGE_ID",
    turns=turns,
    user_id="customer_678",
    session_id="chat_999",
    system_prompt="Help customers with returns.",
    tags=["qa-testing"]
)
print(f"Judge evaluation results: {result.evaluator_results}")
```
