Skip to main content

Reference

Complete API reference and technical documentation for TrainLoop Evals.

Overview​

This reference section provides comprehensive technical documentation for all TrainLoop Evals components. Use this section to find detailed information about APIs, configuration options, and data formats.

Quick Navigation​

Command Line Interface​

SDKs​

Data Formats​

Component Overview​

TrainLoop CLI​

The command-line interface provides:

  • Project initialization and scaffolding
  • Evaluation execution and management
  • Studio UI launching and configuration
  • Registry component management
  • Model benchmarking and comparison

TrainLoop SDKs​

Zero-touch instrumentation libraries for:

  • Python: trainloop-llm-logging package
  • TypeScript/JavaScript: trainloop-llm-logging npm package
  • Go: trainloop-llm-logging module

Data Pipeline​

TrainLoop processes data through these stages:

  1. Collection - SDKs capture LLM interactions
  2. Storage - Events saved as JSONL files
  3. Evaluation - CLI applies metrics to events
  4. Analysis - Studio UI provides visualization

API Patterns​

Consistent Interfaces​

All TrainLoop components follow consistent patterns:

# Python SDK
collect(config_path)
trainloop_tag("tag-name")
# CLI commands
trainloop init
trainloop eval --suite my-suite
trainloop studio --port 8080
// TypeScript SDK
trainloopTag("tag-name")

Configuration​

All components use consistent configuration:

# trainloop.config.yaml
trainloop:
data_folder: "./data"
log_level: "info"

judge:
models: ["openai/gpt-4o-mini"]

benchmark:
providers: ["openai/gpt-4o", "anthropic/claude-3-sonnet"]

Error Handling​

All components use consistent error handling:

# Python - graceful degradation
try:
collect()
except Exception as e:
logger.warning(f"TrainLoop initialization failed: {e}")
# Continue without instrumentation

Integration Examples​

Basic Workflow​

# 1. Initialize project
trainloop init

# 2. Set up environment
export TRAINLOOP_DATA_FOLDER="$(pwd)/trainloop/data"
export OPENAI_API_KEY="your-key"

# 3. Run instrumented application
python your_app.py

# 4. Run evaluation
trainloop eval

# 5. View results
trainloop studio

CI/CD Integration​

# .github/workflows/eval.yml
- name: Run evaluations
run: |
trainloop eval --config ci.config.yaml
trainloop benchmark --max-samples 50

Production Deployment​

# Dockerfile
FROM python:3.11
RUN pip install trainloop-cli
CMD ["trainloop", "studio", "--host", "0.0.0.0"]

Performance Considerations​

SDK Performance​

  • Buffering: Events are buffered for efficient I/O
  • Async logging: Non-blocking data collection
  • Memory usage: Configurable buffer sizes

CLI Performance​

  • Parallel processing: Multiple evaluation processes
  • Caching: Results cached to avoid re-evaluation
  • Incremental processing: Only process new events

Data Volume​

  • Event size: Typical event is 1-5KB
  • Storage growth: Plan for ~1MB per 1000 events
  • Retention: Configure automatic cleanup

Security Considerations​

Data Protection​

  • Encryption: Optional encryption at rest
  • Access control: File-based permissions
  • Audit logging: All operations logged

API Security​

  • Key management: Secure API key storage
  • Rate limiting: Respect provider limits
  • Error handling: No sensitive data in logs

Versioning and Compatibility​

Semantic Versioning​

TrainLoop Evals follows semantic versioning:

  • Major: Breaking changes
  • Minor: New features, backward compatible
  • Patch: Bug fixes

Compatibility​

  • Data formats: Backward compatible
  • API interfaces: Deprecated features marked
  • Migration tools: Automatic data migration

Getting Help​

Documentation​

Community​

See Also​