# Langfuse

## Setup

```python
from langfuse import observe, get_client
from scorable import Scorable

# Initialize Langfuse client using environment variables
# LANGFUSE_SECRET_KEY, LANGFUSE_PUBLIC_KEY, LANGFUSE_HOST
langfuse = get_client()

# Initialize Scorable client
root_signals = Scorable()
```

## Real-Time Evaluation

Evaluate LLM responses as they are generated and automatically log scores to Langfuse.

### Instrumented LLM Function

```python
@observe(name="explain_concept_generation")  # Name for traces in Langfuse UI
def explain_concept(topic: str) -> tuple[str | None, str | None]:
    # Get the trace_id for the current operation, created by @observe
    current_trace_id = langfuse.get_current_trace_id()

    prompt = prompt_template.format(question=topic)
    response_obj = client.chat.completions.create(
        messages=[{"role": "user", "content": prompt}],
        model="gpt-5.5",
    )
    content = response_obj.choices[0].message.content
    return content, current_trace_id
```

### Evaluation Function

```python
def evaluate_concept(request: str, response: str, trace_id: str) -> None:
    # Invoke a specific Scorable judge
    result = root_signals.judges.run(
        judge_id="4d369224-dcfa-45e9-939d-075fa1dad99e",
        request=request,   # The input/prompt provided to the LLM
        response=response, # The LLM's output to be evaluated
    )

    # Iterate through evaluation results and log them as Langfuse scores
    for eval_result in result.evaluator_results:
        langfuse.create_score(
            trace_id=trace_id,                   # Links score to the specific Langfuse trace
            name=eval_result.evaluator_name,     # Name of the Scorable evaluator (e.g., "Truthfulness")
            value=eval_result.score,             # Numerical score from the evaluator
            comment=eval_result.justification,   # Explanation for the score
        )
```

### Usage

```python
# Generate and evaluate
response, trace_id = explain_concept("What is photosynthesis?")
evaluate_concept("What is photosynthesis?", response, trace_id)
```

### Mapping Scorable to Langfuse

| Scorable         | Langfuse  | Description in Langfuse Context                                                                                         |
| ---------------- | --------- | ----------------------------------------------------------------------------------------------------------------------- |
| `evaluator_name` | `name`    | The name of the evaluation criterion (e.g., "Hallucination," "Conciseness"). Used for identifying and filtering scores. |
| `score`          | `value`   | The numerical score assigned by the Scorable evaluator.                                                                 |
| `justification`  | `comment` | The textual explanation from Scorable for the score, providing qualitative insight into the evaluation                  |

## Batch Evaluation

Evaluate traces that have already been observed and stored in Langfuse. This is useful for:

* Running evaluations on historical data
* Batch processing evaluations on production traces

### Evaluating Historical Traces

```python
from datetime import datetime, timedelta
from langfuse import get_client
from scorable import Scorable

# Initialize clients
langfuse = get_client()  # uses environment variables to authenticate
root_signals = Scorable()

if langfuse.auth_check():
    print("Langfuse client is authenticated and ready!")

# Fetch latest 10 traces from the last 24 hours
traces = langfuse.api.trace.list(
    limit=10,
    #tags=["my-tag"], # You can filter traces by tags
    from_timestamp=datetime.now() - timedelta(days=1),
).data

for trace in traces:
    trace_id = trace.id

    # Get all LLM generations for this trace
    observations = langfuse.api.observations.get_many(
        trace_id=trace_id,
        type="GENERATION",
        limit=100
    ).data

    for observation in observations:
        # Extract the LLM input and output
        input = observation.input[0]["parts"][0]["content"]
        output = observation.output[0]["parts"][0]["content"]

        # Run evaluation using Scorable judge
        evaluation_result = root_signals.judges.run_by_name(
            "My awesome judge I created with scorable.ai",
            response=output,
            request=input,
        )

        # Log the evaluation results back to Langfuse
        for evaluator_result in evaluation_result.evaluator_results:
            langfuse.create_score(
                trace_id=trace_id,
                name=evaluator_result.evaluator_name,
                value=evaluator_result.score,
                comment=evaluator_result.justification,
            )

print("Evaluation complete!")
```

<figure><img src="/files/9CkNfPoNn4LTlBFr4SM8" alt=""><figcaption><p>Scorable evaluation results and scores shown in the trace</p></figcaption></figure>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.scorable.ai/integrations/langfuse.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
