Python Examples

Complete working example demonstrating TrainLoop LLM evaluation with Python.

Overview

The Python examples demonstrate two core evaluation scenarios:

Code Generation: Testing LLM ability to write valid Python code
Letter Counting: Testing basic counting accuracy

Prerequisites

Python 3.8+
OpenAI API key (or other supported LLM provider)

Quick Setup

# Navigate to Python examples
cd examples/python

# Create and activate virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Create .env file with API keys
cp .env.example .env

Run Examples

# Code generation example (evaluates if LLM can write valid code)
python writes_valid_code.py

# Letter counting example (evaluates counting accuracy)
python counter_agent.py

# Run each script 3-4 times to collect samples
# Check collected data in trainloop/data/events/

Evaluate Results

# Install TrainLoop CLI globally (recommended)
pipx install trainloop-cli

# Or install in virtual environment
pip install -e ../../cli

# Check that it installed correctly
trainloop --version

# Run evaluation
cd trainloop
trainloop eval

Key Components

AI Request Utility (`ai_request.py`)

from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()
client = OpenAI()

def make_ai_request(
    prompt: str,
    model: str = "gpt-4.1",
    max_tokens: int = 500,
    extra_headers: dict = {},
):
    # Makes request with TrainLoop instrumentation
    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        max_tokens=max_tokens,
        extra_headers=extra_headers,
    )
    return response.choices[0].message.content

TrainLoop Integration

from trainloop_llm_logging import collect, trainloop_tag

# CRITICAL: Import and call collect BEFORE importing OpenAI
collect(flush_immediately=True)

from ai_request import make_ai_request

# Tag requests for evaluation suites
headers = trainloop_tag("code-generation")
response = make_ai_request(prompt, extra_headers=headers)

Expected Output

When you run the examples, you'll see:

[TrainLoop] Loading config...
[TrainLoop] Loaded TrainLoop config from trainloop/trainloop.config.yaml
AI Response: def factorial(n):
    if n < 0:
        raise ValueError("Input must be a non-negative integer")
    elif n == 0 or n == 1:
        return 1
    else:
        return n * factorial(n-1)

Overview​

Prerequisites​

Quick Setup​

Run Examples​

Evaluate Results​

Key Components​

AI Request Utility (ai_request.py)​

TrainLoop Integration​

Expected Output​

Next Steps​

Overview

Prerequisites

Quick Setup

Run Examples

Evaluate Results

Key Components

AI Request Utility (`ai_request.py`)

TrainLoop Integration

Expected Output

Next Steps