Python Examples
Complete working example demonstrating TrainLoop LLM evaluation with Python.
Overview​
The Python examples demonstrate two core evaluation scenarios:
- Code Generation: Testing LLM ability to write valid Python code
- Letter Counting: Testing basic counting accuracy
Prerequisites​
- Python 3.8+
- OpenAI API key (or other supported LLM provider)
Quick Setup​
# Navigate to Python examples
cd examples/python
# Create and activate virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Create .env file with API keys
cp .env.example .env
Run Examples​
# Code generation example (evaluates if LLM can write valid code)
python writes_valid_code.py
# Letter counting example (evaluates counting accuracy)
python counter_agent.py
# Run each script 3-4 times to collect samples
# Check collected data in trainloop/data/events/
Evaluate Results​
# Install TrainLoop CLI globally (recommended)
pipx install trainloop-cli
# Or install in virtual environment
pip install -e ../../cli
# Check that it installed correctly
trainloop --version
# Run evaluation
cd trainloop
trainloop eval
Key Components​
AI Request Utility (ai_request.py
)​
from openai import OpenAI
from dotenv import load_dotenv
load_dotenv()
client = OpenAI()
def make_ai_request(
prompt: str,
model: str = "gpt-4.1",
max_tokens: int = 500,
extra_headers: dict = {},
):
# Makes request with TrainLoop instrumentation
response = client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": prompt}],
max_tokens=max_tokens,
extra_headers=extra_headers,
)
return response.choices[0].message.content
TrainLoop Integration​
from trainloop_llm_logging import collect, trainloop_tag
# CRITICAL: Import and call collect BEFORE importing OpenAI
collect(flush_immediately=True)
from ai_request import make_ai_request
# Tag requests for evaluation suites
headers = trainloop_tag("code-generation")
response = make_ai_request(prompt, extra_headers=headers)
Expected Output​
When you run the examples, you'll see:
[TrainLoop] Loading config...
[TrainLoop] Loaded TrainLoop config from trainloop/trainloop.config.yaml
AI Response: def factorial(n):
if n < 0:
raise ValueError("Input must be a non-negative integer")
elif n == 0 or n == 1:
return 1
else:
return n * factorial(n-1)