Langfuse

Example requires langfuse >=v3.0.0

Setup

from langfuse import observe, get_client
from root import Scorable

# Initialize Langfuse client using environment variables
# LANGFUSE_SECRET_KEY, LANGFUSE_PUBLIC_KEY, LANGFUSE_HOST
langfuse = get_client()

# Initialize Scorable client
root_signals = Scorable()

Real-Time Evaluation

Evaluate LLM responses as they are generated and automatically log scores to Langfuse.

Instrumented LLM Function

@observe(name="explain_concept_generation")  # Name for traces in Langfuse UI
def explain_concept(topic: str) -> tuple[str | None, str | None]:
    # Get the trace_id for the current operation, created by @observe
    current_trace_id = langfuse.get_current_trace_id()

    prompt = prompt_template.format(question=topic)
    response_obj = client.chat.completions.create(
        messages=[{"role": "user", "content": prompt}],
        model="gpt-4",
    )
    content = response_obj.choices[0].message.content
    return content, current_trace_id

Evaluation Function

Usage

Mapping Scorable to Langfuse

Scorable
Langfuse
Description in Langfuse Context

evaluator_name

name

The name of the evaluation criterion (e.g., "Hallucination," "Conciseness"). Used for identifying and filtering scores.

score

value

The numerical score assigned by the Scorable evaluator.

justification

comment

The textual explanation from Scorable for the score, providing qualitative insight into the evaluation

Batch Evaluation

Evaluate traces that have already been observed and stored in Langfuse. This is useful for:

  • Running evaluations on historical data

  • Batch processing evaluations on production traces

Evaluating Historical Traces

Scorable evaluation results and scores shown in the trace

Last updated