> For the complete documentation index, see [llms.txt](https://docs.scorable.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.scorable.ai/concepts-and-examples/cookbooks/evaluate-chatbot-conversation.md).

# Evaluate a multi-turn chatbot conversation

This cookbook shows how to build a chatbot that evaluates conversation quality in real-time using Scorable. The example demonstrates a cooking assistant that uses OpenAI endpoint and evaluates the conversation after each interaction.

## Setup

Install the required packages:

```bash
pip install openai scorable
```

## Building an Evaluated Chatbot

This chatbot evaluates the quality of its responses using Scorable. It tracks the conversation history and assesses the helpfulness of the conversation after each interaction.

```python
from openai import OpenAI
from scorable import Scorable
from scorable.multiturn import Turn

class EvaluatedChat:
    def __init__(self, model="gpt-5.2", scorable_api_key=None, openai_api_key=None):
        self.system_prompt = (
            "You are a helpful cooking assistant that answers questions about recipes and cooking."
        )
        self.model = model
        self.openai_client = OpenAI(api_key=openai_api_key)
        self.scorable_client = Scorable(api_key=scorable_api_key)
        self.conversation_history = []

    def add_message(self, user_message):
        # Add user message to history
        self.conversation_history.append({"role": "user", "content": user_message})

        # Get response from OpenAI using Responses API
        response = self.openai_client.responses.create(
            model=self.model,
            instructions=self.system_prompt,
            input=self.conversation_history,
        )

        # Extract assistant response
        assistant_message = response.output_text
        self.conversation_history.append({"role": "assistant", "content": assistant_message})

        # Evaluate the conversation
        evaluation = self.evaluate_conversation()

        return {"response": assistant_message, "evaluation": evaluation}

    def evaluate_conversation(self):
        # Convert conversation history to Scorable Turns format
        turns = [Turn(role=m["role"], content=m["content"]) for m in self.conversation_history]

        # Evaluate helpfulness
        result = self.scorable_client.evaluators.Helpfulness(turns=turns)
        return {"score": result.score, "justification": result.justification}
```

## Example Usage

```python
# Initialize the chatbot
chat = EvaluatedChat(
    # Alternatively, you can use the SCORABLE_API_KEY environment variable
    scorable_api_key="your-scorable-api-key",
    openai_api_key="your-openai-api-key"
)

# First interaction
result = chat.add_message("How do I make a perfect scrambled egg?")
print("Assistant:", result['response'])
print(f"Helpfulness: {result['evaluation']['score']:.2f}")

# Second interaction
result = chat.add_message("What temperature should I use?")
print("Assistant:", result['response'])
print(f"Helpfulness: {result['evaluation']['score']:.2f}")
```

### Using Judges for Multiple Evaluators

To run multiple evaluators at once (e.g., helpfulness, clarity, politeness, custom evaluators), use a judge instead:

```python
def evaluate_conversation(self):
    turns = [Turn(role=m["role"], content=m["content"]) for m in self.conversation_history]

    # Run a judge with multiple evaluators
    result = self.scorable_client.judges.run(
        judge_id="your-judge-id",
        turns=turns
    )

    return {"evaluator_results": result.evaluator_results}
```