> For the complete documentation index, see [llms.txt](https://docs.scorable.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.scorable.ai/concepts-and-examples/cookbooks/batch-evaluation.md).

# Run batch evaluations

The Judge Batch Execution API allows you to evaluate multiple request-response pairs in parallel using a single judge. This is ideal for bulk evaluation scenarios like testing datasets and offline evals.

## Typical Workflow

### Step 1: Create a Batch Execution

Submit multiple inputs for evaluation. The API returns immediately with a batch execution ID.

```bash
curl -X POST "https://api.scorable.ai/v1/judges/{my_judge_id}/batch-execute/" \
  -H "Authorization: Api-Key ${SCORABLE_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "inputs": [
      {
        "request": "What is the capital of France?",
        "response": "Paris is the capital and largest city of France."
      },
      {
        "request": "What is the capital of Spain?",
        "response": "Madrid is the capital of Spain."
      },
      {
        "request": "What is the capital of Italy?",
        "response": "Rome is the capital city of Italy."
      }
    ],
    "tags": ["my-app-v1.2"]
  }'
```

**Request Parameters:**

* `inputs` (required): Array of evaluation inputs (min: 1, max: 100)
  * `request` The input/prompt/question
  * `response` The output/answer to evaluate
  * `contexts` (optional): Array of context strings (if judge requires it)
  * `expected_output` (optional): Expected output for comparison (if judge requires it)
  * `messages` (optional): Multi-turn conversation object for evaluating agent behavior (see below)
* `tags` (optional): Array of strings to tag the execution logs with
* `judge_version_id` (optional): Specific judge version UUID (defaults to latest)

**Response (202 Accepted):**

```json
{
  "batch_execution_id": "123e4567-e89b-12d3-a456-426614174000",
  "status_url": "/v1/judges/batch-executions/123e4567-e89b-12d3-a456-426614174000/"
}
```

### Step 2: Poll for Status

Check the progress of your batch execution. Poll this endpoint until status is `completed` or `failed`.

```bash
BATCH_ID="123e4567-e89b-12d3-a456-426614174000"

curl -X GET "https://api.scorable.ai/v1/judges/batch-executions/${BATCH_ID}/" \
  -H "Authorization: Api-Key ${SCORABLE_API_KEY}"
```

**Response:**

```json
{
  "batch_execution_id": "123e4567-e89b-12d3-a456-426614174000",
  "status": "processing",
  "total_count": 3,
  ...
}
```

**Batch Status Values:**

* `pending`: Batch is queued and waiting to start
* `processing`: Batch is currently being executed
* `completed`: All items completed (check individual items for failures)
* `failed`: Entire batch failed

**Item Status Values:**

* `pending`: Item waiting to be processed
* `processing`: Item currently being evaluated
* `completed`: Item evaluation finished
* `failed`: Item evaluation failed

### Step 3: Retrieve Results

Once `status` is `completed`, all evaluator results are available in the response.

```json
  "items": [
    {
      "index": 0,
      "status": "completed",
      "input": {
        "request": "What is the capital of France?",
        "response": "Paris is the capital and largest city of France.",
        "contexts": null,
        "expected_output": null
      },
      "evaluator_results": [
        {
          "score": 0.95,
          "justification": "The response is relevant to the request...",
          "evaluator_name": "Relevance"
        },
        ...
      ]
    },
    ...
  ]
}
```