# Judges

Judges are stacks of [Evaluators](/concepts-and-examples/usage/evaluators.md) with their own high-level intent.

## Generating a Judge

Scorable can generate a complete judge — including all its evaluators — from a plain-language description of what you want to measure.

**CLI**

```bash
scorable judge generate --intent "I am building a customer support chatbot. Evaluate that responses are helpful and follow our refund policy."
```

Attach a PDF policy document so the generated evaluators can check compliance against it:

```bash
# Upload and generate in one step
scorable judge generate \
  --intent "Evaluate responses against the attached policy." \
  --file ./policy.pdf

# Or reuse a previously uploaded file
scorable judge generate \
  --intent "Evaluate responses against the attached policy." \
  --file-id <file_uuid>
```

**Python SDK**

```python
from scorable import Scorable

client = Scorable(api_key="$MY_API_KEY")

# Upload a policy document first
file_id = client.files.upload("./policy.pdf")

# Generate a judge that uses it
result = client.judges.generate(
    intent="Evaluate responses against the attached policy.",
    file_id=str(file_id),
)
print(result.judge_id)
```

If the intent is ambiguous the API returns `missing_context_from_system_goal` — a list of fields that would improve the judge. Re-run with `--extra-contexts` (CLI) or `extra_contexts` (SDK) to fill them in.

You can see the overview of your Judges in the app:

<figure><img src="/files/dmaCj4UHvuafGA1j0S5E" alt=""><figcaption></figcaption></figure>

**Execute via OpenAI-compatible Endpoint**

```python
# pip install openai
from openai import OpenAI


client = OpenAI(
    api_key="$MY_API_KEY",
    base_url="https://api.scorable.ai/v1/judges/$MY_JUDGE_ID/openai/"
)

response = client.chat.completions.create(
    model="gpt-5.2",
    messages=[
        {"role": "user", "content": "I want to return my product"}
    ]
)

print(f"Assistant's response: {response.choices[0].message.content}")
print(f"Judge evaluation results: {response.model_extra.get('evaluator_results')}")
```

> **Bring your own key.** The OpenAI-compatible endpoints (`/openai/chat/completions`, `/openai/responses`, `/refine/openai/chat/completions`, `/refine/openai/responses`) proxy the model call through Scorable, so they require a customer-managed provider key. Connect a key for the requested model's provider in **Organization Settings → Providers**; otherwise the request is rejected with `403 byok_required`. The non-proxy execution endpoints below are unaffected.

**cURL**

```bash
curl 'https://api.scorable.ai/v1/judges/$MY_JUDGE_ID/execute/' \
-H 'authorization: Api-Key $MY_API_KEY' \
-H 'content-type: application/json' \
--data-raw '{"response":"LLM said: You can return the item within 30 days of purchase, and we will refund the full amount...","request":"I want to return my product"}'
```

**Python**

```python
# pip install scorable
from scorable import Scorable

client = Scorable(api_key="$MY_API_KEY")
result = client.judges.run(
    judge_id="$MY_JUDGE_ID",
    response="LLM said: You can return the item within 30 days of purchase, and we will refund the full amount...",
    request="I want to return my product"
)
print(f"Run results: {result.evaluator_results}")
# Score (a float between 0 and 1): {result.evaluator_results[0].score}
# Justification for the score: {result.evaluator_results[0].justification}
```

## Execution Metadata

Similar to evaluators, you can pass metadata to judge executions to improve traceability and evaluation context.

* **`user_id`**: Identify which end-user triggered the evaluation.
* **`session_id`**: Group evaluations by conversation session.
* **`system_prompt`**: Provide the original system context to the judge.
* **`tags`**: Free form tags for more powerful filtering and more actionable insights.

**Example (Python SDK):**

```python
result = client.judges.run(
    judge_id="$MY_JUDGE_ID",
    response="...",
    request="...",
    user_id="customer_678",
    session_id="chat_999",
    system_prompt="Help customers with returns.",
    tags=["qa-testing"]
)
```

## File Inputs

Judges support the same `file_ids` parameter as evaluators. Upload a file first via `POST /v1/files/`, then pass the returned ID(s) to the judge execution. PDFs are extracted to text context; images are passed as visual inputs to vision-capable models.

See [Evaluators — File Inputs](/concepts-and-examples/usage/evaluators.md#file-inputs) for the full upload flow and examples.

## Multi-Turn Conversations

Judges can also evaluate multi-turn conversations to assess agent behavior across an entire interaction. You can provide message history containing the full interaction, including tool calls.

**Example (Python SDK):**

```python
from scorable import Scorable
from scorable.multiturn import Turn

client = Scorable(api_key="$MY_API_KEY")

# Create a multi-turn conversation
turns = [
    Turn(role="user", content="Hello, I need help with my order"),
    Turn(role="assistant", content="I'd be happy to help! What's your order number?"),
    Turn(role="user", content="It's ORDER-12345"),
    Turn(
        # Assistant turn can be a tool call which may not be directly visible to the user.
        role="assistant",
        content="{'order_number': 'ORDER-12345', 'status': 'shipped', 'eta': 'Jan 20'}",
        tool_name="order_lookup",
    ),
    Turn(
        role="assistant",
        content="I found your order. It's currently in transit.",
    ),
]

# Evaluate the multi-turn conversation with a judge
result = client.judges.run(
    judge_id="$MY_JUDGE_ID",
    turns=turns,
    user_id="customer_678",
    session_id="chat_999",
    system_prompt="Help customers with returns.",
    tags=["qa-testing"]
)
print(f"Judge evaluation results: {result.evaluator_results}")
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.scorable.ai/concepts-and-examples/usage/judges.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
