Tutorials

Welcome to the TrainLoop Evals tutorials! These step-by-step guides will take you from complete beginner to advanced user.

TrainLoop Evals Architecture

TrainLoop Evals Flow

TrainLoop Evals is a comprehensive evaluation framework that captures LLM interactions, applies custom metrics, and provides powerful visualization tools for analysis and model comparison.

Learning Path

Follow these tutorials in order for the best learning experience:

🚀 Getting Started

Quick Start Guide - Set up TrainLoop Evals and run your first evaluation (5 minutes)

Start here if you're new to TrainLoop Evals. You'll learn the core concepts and get a working evaluation setup.

📊 Core Evaluation Skills

Writing Your First Evaluation - Create custom metrics and understand the evaluation process (15 minutes)

Learn how to write effective metrics and organize them into suites for comprehensive evaluation.

Advanced Metrics with LLM Judge - Use AI to evaluate AI with sophisticated metrics (20 minutes)

Move beyond simple rules to sophisticated evaluation using LLM Judge for complex quality assessments.

🏆 Model Optimization

Benchmarking and Model Comparison - Compare LLM providers and find the best model for your use case (15 minutes)

Learn to systematically compare different LLM providers and models to optimize both performance and cost.

🔧 Production Deployment

Production Setup and CI/CD - Deploy TrainLoop Evals in production environments (30 minutes)

Set up automated evaluation pipelines, monitoring, and integration with your development workflow.

What You'll Learn

By completing these tutorials, you'll be able to:

✅ Set up TrainLoop Evals for any LLM application
✅ Write custom metrics to evaluate your specific use cases
✅ Use LLM Judge for sophisticated quality evaluation
✅ Compare and benchmark different LLM providers
✅ Deploy evaluation pipelines in production
✅ Monitor and track LLM performance over time

Prerequisites

Basic familiarity with Python or your chosen programming language
An LLM application or the desire to build one
API keys for at least one LLM provider (OpenAI, Anthropic, etc.)

Need Help?

Stuck on a tutorial? Check our guides
Want to go deeper? Explore our guides
Need reference information? Check our reference documentation
Join the community? Ask questions on Discord

Ready to start? Begin with the Quick Start Guide!

TrainLoop Evals Architecture​

Learning Path​

🚀 Getting Started​

📊 Core Evaluation Skills​

🏆 Model Optimization​

🔧 Production Deployment​

What You'll Learn​

Prerequisites​

Need Help?​