Langfuse
Example requires langfuse >=v3.0.0
Setup
from langfuse import observe, get_client
from root import Scorable
# Initialize Langfuse client using environment variables
# LANGFUSE_SECRET_KEY, LANGFUSE_PUBLIC_KEY, LANGFUSE_HOST
langfuse = get_client()
# Initialize Scorable client
root_signals = Scorable()Real-Time Evaluation
Evaluate LLM responses as they are generated and automatically log scores to Langfuse.
Instrumented LLM Function
@observe(name="explain_concept_generation") # Name for traces in Langfuse UI
def explain_concept(topic: str) -> tuple[str | None, str | None]:
# Get the trace_id for the current operation, created by @observe
current_trace_id = langfuse.get_current_trace_id()
prompt = prompt_template.format(question=topic)
response_obj = client.chat.completions.create(
messages=[{"role": "user", "content": prompt}],
model="gpt-4",
)
content = response_obj.choices[0].message.content
return content, current_trace_idEvaluation Function
Usage
Mapping Scorable to Langfuse
Scorable
Langfuse
Description in Langfuse Context
evaluator_name
name
The name of the evaluation criterion (e.g., "Hallucination," "Conciseness"). Used for identifying and filtering scores.
score
value
The numerical score assigned by the Scorable evaluator.
justification
comment
The textual explanation from Scorable for the score, providing qualitative insight into the evaluation
Batch Evaluation
Evaluate traces that have already been observed and stored in Langfuse. This is useful for:
Running evaluations on historical data
Batch processing evaluations on production traces
Evaluating Historical Traces

Last updated