Contributing Guide

Welcome to the TrainLoop Evals project! We're excited to have you contribute to our comprehensive LLM evaluation framework. This guide will help you get started with contributing to the project.

Quick Start

Fork the repository on GitHub
Clone your fork locally
Set up your development environment (see Local Development)
Make your changes following our Code Style
Test your changes using our Testing Guide
Submit a pull request following our Pull Request Process

Project Overview

TrainLoop Evals is a multi-component system consisting of:

CLI Tool (cli/) - Python-based evaluation engine with commands for initialization, evaluation, and studio launch
Studio UI (ui/) - Next.js web interface for data visualization and analysis
Multi-language SDKs - Python (sdk/python/), TypeScript (sdk/typescript/), and Go (sdk/go/) instrumentation libraries
Registry System (registry/) - Shareable metrics and evaluation suites with Python-based config discovery

For detailed architecture information, see our Architecture Guide.

Ways to Contribute

Code Contributions

Bug Fixes

Check existing issues for bug reports
Follow the bug reproduction template
Include tests that verify the fix

New Features

Open a feature request first
Discuss the approach with maintainers
Implement with comprehensive tests and documentation

SDK Enhancements

Add support for new LLM providers or HTTP libraries
Improve instrumentation accuracy or performance
Add new language SDK implementations

UI Improvements

Enhance data visualization components
Add new chart types or dashboard features
Improve user experience and accessibility

Documentation Contributions

Fix typos or improve clarity in existing documentation
Add new guides or examples
Update API documentation
Create video tutorials or blog posts

Testing Contributions

Add missing test coverage
Create integration tests for new scenarios
Improve test performance or reliability
Add load testing or benchmarking

Development Workflow

Getting Started

Prerequisites
- Python 3.9+ with Poetry
- Node.js 20.0+ with npm
- Go 1.21+ (for Go SDK development)
- Git

Setup

git clone https://github.com/YOUR_USERNAME/evals.git
cd evals

Install dependencies

# CLI dependencies
cd cli && poetry install

# SDK dependencies
cd ../sdk/python && poetry install

# UI dependencies
cd ../../ui && npm install

# Return to root
cd ..

Making Changes

Create a feature branch

git checkout -b feature/your-feature-name

Make your changes
- Follow our Code Style guidelines
- Write tests for new functionality
- Update documentation as needed

Test your changes

# Run all tests
task test

# Run specific test suites
task test:cli
task test:sdk
task test:ui

Commit your changes
```
git add .
git commit -m "feat: add new evaluation metric for response quality"
```
Use conventional commit messages:
- feat: for new features
- fix: for bug fixes
- docs: for documentation changes
- test: for test improvements
- refactor: for code refactoring
- perf: for performance improvements

Code Quality Standards

Testing Requirements

Unit tests for all new functions and methods
Integration tests for component interactions (SDK integration tests use standalone runner)
End-to-end tests for user workflows
Performance tests for critical paths

Important for SDK Contributors: SDK integration tests must use the standalone test runner (python run_integration_tests.py) instead of pytest due to import order requirements. See the Testing Guide for details.

Code Review Process

All code changes require review from at least one maintainer
Automated tests must pass
Code style checks must pass
Documentation must be updated for user-facing changes

Performance Considerations

Keep CLI commands responsive (< 2 seconds for common operations)
Minimize SDK overhead on instrumented applications
Optimize UI rendering for large datasets
Use appropriate caching strategies

Contributing to Different Components

CLI Development

The CLI is built with Python using Click framework:

Location: cli/trainloop_cli/
Commands: init, eval, studio, add, benchmark
Testing: Use pytest with markers for different test types
Key files:
- cli/trainloop_cli/commands/ - Command implementations
- cli/trainloop_cli/eval_core/ - Core evaluation logic

SDK Development

Python SDK

Location: sdk/python/trainloop_llm_logging/
Focus: Zero-touch HTTP instrumentation
Key files:
- instrumentation/ - HTTP library patches
- store.py - Data persistence
- register.py - Registration API
Integration Testing: Use python run_integration_tests.py (not pytest)

TypeScript SDK

Location: sdk/typescript/src/
Focus: Node.js HTTP instrumentation
Key files:
- instrumentation/ - HTTP and fetch patches
- store.ts - Data persistence
- config.ts - Configuration management

Go SDK

Location: sdk/go/trainloop-llm-logging/
Focus: Go HTTP instrumentation
Key files:
- instrumentation/ - HTTP transport wrapping
- internal/store/ - Data persistence
- internal/config/ - Configuration

UI Development

The Studio UI is built with Next.js 15 and React 18:

Location: ui/
Framework: Next.js with App Router
Database: DuckDB for local data querying
UI Components: shadcn/ui with Tailwind CSS
Key directories:
- app/ - Next.js app routes
- components/ - React components
- database/ - DuckDB integration

Registry Development

The registry system enables sharing of metrics and suites:

Location: registry/
Configuration: Python-based config.py files
Discovery: Type-safe component discovery
Key files:
- metrics/ - Shareable metrics
- suites/ - Evaluation suites
- config_types.py - Configuration types

Documentation Guidelines

Writing Style

Use clear, concise language
Include practical examples
Provide step-by-step instructions
Add troubleshooting sections where relevant

Code Examples

# Good: Include context and explain the example
from trainloop_llm_logging import collect

# Initialize TrainLoop logging for your application
collect("./trainloop/trainloop.config.yaml")

# Your LLM calls will now be automatically logged

API Documentation

Document all public functions and classes
Include parameter types and return values
Provide usage examples
Note any breaking changes

Community Guidelines

Communication

Be respectful and inclusive
Ask questions in GitHub Discussions
Use GitHub Issues for bugs and feature requests
Join our community channels for real-time discussion

Code of Conduct

We follow the Contributor Covenant. Please be respectful and professional in all interactions.

Getting Help

For Contributors

Development questions: Open a GitHub Discussion
Bug reports: Use the bug report template
Feature requests: Use the feature request template

For Maintainers

Review guidelines: See Pull Request Process
Release process: See Release Process

Resources

Architecture Guide - System design and component interaction
Local Development - Development environment setup
Testing Guide - Test framework and practices
Code Style - Coding standards and conventions
API Reference - Detailed API documentation

License

By contributing to TrainLoop Evals, you agree that your contributions will be licensed under the MIT License.

Thank you for contributing to TrainLoop Evals! Your contributions help make LLM evaluation more accessible and effective for the entire community.

Quick Start​

Project Overview​

Ways to Contribute​

Code Contributions​

Bug Fixes​

New Features​

SDK Enhancements​

UI Improvements​

Documentation Contributions​

Testing Contributions​

Development Workflow​

Getting Started​

Making Changes​

Code Quality Standards​

Testing Requirements​

Code Review Process​

Performance Considerations​

Contributing to Different Components​

CLI Development​

SDK Development​

Python SDK​

TypeScript SDK​

Go SDK​

UI Development​

Registry Development​

Documentation Guidelines​

Writing Style​

Code Examples​

API Documentation​

Community Guidelines​

Communication​

Code of Conduct​

Getting Help​

For Contributors​

For Maintainers​

Resources​

License​