Run batch evaluations
The Judge Batch Execution API allows you to evaluate multiple request-response pairs in parallel using a single judge. This is ideal for bulk evaluation scenarios like testing datasets and offline evals.
Typical Workflow
Step 1: Create a Batch Execution
Submit multiple inputs for evaluation. The API returns immediately with a batch execution ID.
curl -X POST "https://api.scorable.ai/v1/judges/{my_judge_id}/batch-execute/" \
-H "Authorization: Api-Key ${SCORABLE_API_KEY}" \
-H "Content-Type: application/json" \
-d '{
"inputs": [
{
"request": "What is the capital of France?",
"response": "Paris is the capital and largest city of France."
},
{
"request": "What is the capital of Spain?",
"response": "Madrid is the capital of Spain."
},
{
"request": "What is the capital of Italy?",
"response": "Rome is the capital city of Italy."
}
],
"tags": ["my-app-v1.2"]
}'Request Parameters:
inputs(required): Array of evaluation inputs (min: 1, max: 100)requestThe input/prompt/questionresponseThe output/answer to evaluatecontexts(optional): Array of context strings (if judge requires it)functions(optional): Array of function definitions (if judge requires it)expected_output(optional): Expected output for comparison (if judge requires it)
tags(optional): Array of strings to tag the execution logs withjudge_version_id(optional): Specific judge version UUID (defaults to latest)
Response (202 Accepted):
Step 2: Poll for Status
Check the progress of your batch execution. Poll this endpoint until status is completed or failed.
Response:
Batch Status Values:
pending: Batch is queued and waiting to startprocessing: Batch is currently being executedcompleted: All items completed (check individual items for failures)failed: Entire batch failed
Item Status Values:
pending: Item waiting to be processedprocessing: Item currently being evaluatedcompleted: Item evaluation finishedfailed: Item evaluation failed
Step 3: Retrieve Results
Once status is completed, all evaluator results are available in the response.
Last updated