Comprehensively Test Your LLM Code
Overview
Testing Dimensions
1. Response Quality
from scorable import Scorable
client = Scorable(api_key="your-api-key")
# Test response quality with multiple evaluators
relevance_result = client.evaluators.Relevance(
request="What is the capital of France?",
response="The capital of France is Paris, which is located in the north-central part of the country."
)
coherence_result = client.evaluators.Coherence(
request="Explain machine learning",
response="Machine learning is a subset of artificial intelligence that enables computers to learn and improve from experience without being explicitly programmed."
)
completeness_result = client.evaluators.Completeness(
request="List the benefits of renewable energy",
response="Renewable energy reduces carbon emissions, lowers long-term costs, and provides energy independence."
)2. Security & Privacy
3. Performance & Effectiveness
4. Messaging Alignment
Testing Approaches
Single Evaluator Testing
Multi-Evaluator Testing with Judges
RAG-Specific Testing
Ground Truth Testing
Multi-Turn Conversation Testing
Testing Methodologies
Batch Testing Function
Regression Testing
Skills-Based Testing
Creating Test Skills
Best Practices
Test Planning
Evaluation Design
Continuous Improvement
Integration Examples
CI/CD Pipeline Testing
Troubleshooting
Common Issues
Best Practices for Robust Testing
Last updated