> For the complete documentation index, see [llms.txt](https://docs.scorable.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.scorable.ai/concepts-and-examples/cookbooks/add-a-custom-evaluator.md).

# Add a custom evaluator

Scorable provides evaluators that fit most needs, but you can add custom evaluators for specific needs. In this guide, we will add a custom evaluator and tune its performance using demonstrations.

### Example: Weasel words

Consider a use case where you need to evaluate a text based on its number of weasel words or ambiguous phrases. Scorable provides the optimized ***Precision*** evaluator for this, but let's build something similar to go through the evaluator-building process.

1. **Navigate to the Evaluator Page:**
   * Go to the evaluator page and click on "New Evaluator."
2. **Name Your Evaluator:**
   * Type the name for the evaluator, for example, "Direct language."
3. **Define the Intent:**
   * Give the evaluator an intent, such as "Ensures the text does not contain weasel words."
4. **Create the Prompt:**
   * "Is the following text clear and has no weasel words"
5. **Add a placeholder (variable) for the text to evaluate:**
   * Click on the "Add Variable" button to add a placeholder for the text to evaluate.
     * E.g., "Is the following text clear and has no weasel words: {{response}}"
6. **Select the Model:**
   * Choose the model, such as **gpt-5.5**, for this evaluation.
7. **Save and Test the Evaluator:**
   * Click **Create evaluator** and [begin experimenting with it](/concepts-and-examples/cookbooks/evaluate-an-llm-response.md).

### Improve the custom evaluator performance

You can add demonstrations to the evaluator to tune its scores to match more closely to the desired behavior.

#### Example: Improve the Weasel words evaluator

Let's penalize using the word "probably"

1. **Go to the Weasel words evaluator and click Edit.**
2. **In the Demonstrations section, click Add** to open the demonstrations editor.
3. **Add a demonstration.** For each example, fill in:
   * **Response**: "This solution will probably work for most users."
   * **Request** *(optional)*: the input the response was produced for, when it matters for the evaluation.
   * **Label**: 👎 (thumbs down) — the response hedges with "probably", so it fails the check.
   * **Justification** *(optional)*: a short note on why — for example, "Uses the hedging word 'probably'." The model reads the justification to learn the reasoning behind the label, which helps with ambiguous cases.
4. **Add more examples** with **Add example**, or import several at once from a CSV. You can also select an existing labeled dataset instead of typing examples in.
5. **Save the demonstrations, then save the evaluator and try it out.**

Note that adding more demonstrations, such as

* "The project will probably be completed on time."
* "We probably won't need to make any major changes."
* "He probably knows the answer to your question."
* "There will probably be a meeting tomorrow."
* "It will probably rain later today."

will further adjust the evaluator's behavior. Refer to the full evaluator [documentation](/concepts-and-examples/usage/evaluators.md) for more information.

Once you have demonstrations tuned, the next step is verifying the evaluator is actually reliable. See [Add a calibration set](/concepts-and-examples/cookbooks/add-a-custom-evaluator/add-a-calibration-set.md) — including how to use the **ladder algorithm** to generate calibration examples automatically instead of hand-crafting them.\\