SDK Testing Guide

This document describes how to run tests for both Python and TypeScript SDKs in the TrainLoop Evals ecosystem.

Overview

Both SDKs have comprehensive test suites covering:

Unit tests: Test individual components in isolation
Integration tests: Test components working together
Edge case tests: Test boundary conditions and error scenarios

Python SDK Testing

Prerequisites

cd sdk/python
poetry install

Running Tests

# Run unit tests (recommended)
poetry run pytest -m unit

# Run with coverage
poetry run pytest --cov -m unit

# Run specific unit test categories
poetry run pytest -m unit              # Unit tests only
poetry run pytest -m edge_case         # Edge case tests only

# Run tests in parallel
poetry run pytest -n auto -m unit

# Run specific test file
poetry run pytest tests/unit/test_config.py

# Run with verbose output
poetry run pytest -v -m unit

# Generate HTML coverage report
poetry run pytest --cov --cov-report=html -m unit

🚨 Important Note: Integration Tests

Integration tests cannot be run through pytest due to a fundamental limitation. The TrainLoop SDK requires initialization before any HTTP libraries (like requests, httpx, openai) are imported. However, pytest and its plugins import these libraries before the SDK can be initialized, preventing proper instrumentation.

For integration testing, use the standalone test runner instead:

# Run all integration tests
python run_integration_tests.py

# Run specific integration tests
python run_integration_tests.py --test openai
python run_integration_tests.py --test anthropic

# Run with verbose output
python run_integration_tests.py --verbose

# Run a specific test category
python run_integration_tests.py --test litellm --verbose

Available integration tests:

openai - Tests OpenAI SDK integration
anthropic - Tests Anthropic SDK integration
litellm - Tests LiteLLM integration
httpx - Tests raw httpx integration

Test Structure

sdk/python/tests/
├── conftest.py          # Shared fixtures and configuration
├── unit/                # Unit tests
│   ├── test_config.py
│   ├── test_exporter.py
│   ├── test_store.py
│   └── ...
├── integration/         # Integration tests
│   ├── test_collection.py
│   ├── test_instrumentation.py
│   └── ...
└── edge_cases/          # Edge case tests
    ├── test_config_edge_cases.py
    ├── test_network_edge_cases.py
    └── ...

Test Markers

@pytest.mark.unit: Unit tests
@pytest.mark.integration: Integration tests
@pytest.mark.slow: Slow tests (>1s)
@pytest.mark.edge_case: Edge case tests
@pytest.mark.requires_network: Tests requiring network access
@pytest.mark.requires_fs: Tests requiring filesystem access

TypeScript SDK Testing

Prerequisites

cd sdk/typescript
npm install

Running Tests

# Run all tests
npm test

# Run with coverage
npm run test:coverage

# Run specific test categories
npm run test:unit              # Unit tests only
npm run test:integration       # Integration tests only
npm run test:edge             # Edge case tests only

# Run in watch mode
npm run test:watch

# Run specific test file
npm test -- config.test.ts

# Generate coverage report
npm run test:coverage

Test Structure

sdk/typescript/tests/
├── setup.ts             # Jest setup and configuration
├── test-utils.ts        # Shared test utilities
├── unit/                # Unit tests
│   ├── config.test.ts
│   ├── exporter.test.ts
│   ├── store.test.ts
│   └── ...
├── integration/         # Integration tests
│   ├── collection.test.ts
│   ├── instrumentation.test.ts
│   └── ...
├── edge-cases/          # Edge case tests
│   ├── config-edge.test.ts
│   ├── network-edge.test.ts
│   └── ...
└── fixtures/            # Test data and fixtures

Test Architecture

Clean Code Separation

The SDKs maintain clean separation between production and test code:

No test logic in production code: The main SDK files (e.g., index.ts) contain only production code
Test isolation via mocking: Tests mock the FileExporter to prevent background timers and file I/O
Graceful shutdown: The shutdown() function is useful for both production (graceful shutdown) and tests (cleanup)

TypeScript Test Setup

The TypeScript test setup (tests/setup.ts) handles:

Setting test environment variables
Mocking the FileExporter to prevent background operations
Suppressing console output during tests
Cleaning up after all tests complete

Python Test Setup

The Python test setup (tests/conftest.py) provides:

Fixtures for temporary directories and files
Environment variable management
Mock objects for testing without side effects
Sample request/response data for different LLM providers

Common Test Scenarios

Configuration Tests

Missing environment variables
Invalid YAML syntax
Missing config files
Path resolution (absolute/relative)
Environment variable precedence

HTTP Instrumentation Tests

Different HTTP libraries (requests, httpx, urllib)
Network failures
Timeouts
Invalid responses
Large payloads
Concurrent requests

Storage Tests

File system permissions
Disk full scenarios
Concurrent writes
Registry corruption
Invalid data formats

Parser Tests

OpenAI format parsing
Anthropic format parsing
Malformed JSON
Missing fields
Streaming responses

Exporter Tests

Buffer management
Timer cleanup
Export failures
Shutdown handling

Writing New Tests

Python Example

import pytest
from trainloop_llm_logging import collect

@pytest.mark.unit
def test_my_feature(temp_data_dir, mock_env_vars):
    """Test description."""
    # Arrange
    os.environ["TRAINLOOP_DATA_FOLDER"] = temp_data_dir
    
    # Act
    result = my_function()
    
    # Assert
    assert result == expected_value

TypeScript Example

import { myFunction } from '../../src/myModule';
import { createTempDir, cleanupTempDir } from '../test-utils';

describe('My Feature', () => {
  let tempDir: string;

  beforeEach(() => {
    tempDir = createTempDir();
  });

  afterEach(() => {
    cleanupTempDir(tempDir);
  });

  it('should do something', () => {
    // Arrange
    process.env.TRAINLOOP_DATA_FOLDER = tempDir;
    
    // Act
    const result = myFunction();
    
    // Assert
    expect(result).toBe(expectedValue);
  });
});

Continuous Integration

Both test suites are designed to run in CI environments:

Tests run in isolated environments
No external dependencies required
Temporary files are cleaned up
Console output is suppressed

Debugging Tests

Python

# Run with debugging output
poetry run pytest -s

# Run with breakpoint
poetry run pytest --pdb

# Run specific test with verbose output
poetry run pytest -v -k "test_name"

TypeScript

# Run with debugging
node --inspect-brk ./node_modules/.bin/jest --runInBand

# Run specific test
npm test -- --testNamePattern="test name"

# Show console output
npm test -- --verbose

Coverage Goals

Both SDKs aim for:

80%+ line coverage
80%+ branch coverage
80%+ function coverage
Critical paths at 100% coverage

View coverage reports:

Python: open htmlcov/index.html
TypeScript: open coverage/lcov-report/index.html

Integration with CLI Testing

SDK tests integrate with the broader CLI testing framework:

# Run all tests including CLI integration
pytest

# Run SDK-specific integration tests
pytest -m sdk

# Run full integration test suite
pytest -m integration

Performance Testing

Python SDK Performance

# Run performance benchmarks
poetry run pytest tests/performance/ -v

# Profile memory usage
poetry run pytest --profile tests/unit/test_exporter.py

TypeScript SDK Performance

# Run performance tests
npm run test:performance

# Profile memory and CPU usage
npm run test:profile

Mock LLM Providers

For testing without hitting real APIs:

Python

from tests.helpers.mock_llm import MockOpenAI, MockAnthropic

@pytest.fixture
def mock_openai():
    return MockOpenAI(responses=["Test response"])

TypeScript

import { mockLLMProvider } from '../test-utils';

describe('LLM Integration', () => {
  beforeEach(() => {
    mockLLMProvider('openai', { response: 'Test response' });
  });
});

Best Practices

Isolate tests: Each test should be independent
Use meaningful names: Test names should describe what is being tested
Test edge cases: Include boundary conditions and error scenarios
Mock external dependencies: Don't rely on external services
Keep tests fast: Unit tests should run in milliseconds
Clean up resources: Ensure temporary files and connections are closed
Use fixtures: Share common setup code via fixtures
Test error paths: Verify error handling works correctly

Overview​

Python SDK Testing​

Prerequisites​

Running Tests​

🚨 Important Note: Integration Tests​

Test Structure​

Test Markers​

TypeScript SDK Testing​

Prerequisites​

Running Tests​

Test Structure​

Test Architecture​

Clean Code Separation​

TypeScript Test Setup​

Python Test Setup​

Common Test Scenarios​

Configuration Tests​

HTTP Instrumentation Tests​

Storage Tests​

Parser Tests​

Exporter Tests​

Writing New Tests​

Python Example​

TypeScript Example​

Continuous Integration​

Debugging Tests​

Python​

TypeScript​

Coverage Goals​

Integration with CLI Testing​

Performance Testing​

Python SDK Performance​

TypeScript SDK Performance​

Mock LLM Providers​

Python​

TypeScript​

Best Practices​

Overview

Python SDK Testing

Prerequisites

Running Tests

🚨 Important Note: Integration Tests

Test Structure

Test Markers

TypeScript SDK Testing

Prerequisites

Running Tests

Test Structure

Test Architecture

Clean Code Separation

TypeScript Test Setup

Python Test Setup

Common Test Scenarios

Configuration Tests

HTTP Instrumentation Tests

Storage Tests

Parser Tests

Exporter Tests

Writing New Tests

Python Example

TypeScript Example

Continuous Integration

Debugging Tests

Python

TypeScript

Coverage Goals

Integration with CLI Testing

Performance Testing

Python SDK Performance

TypeScript SDK Performance

Mock LLM Providers

Python

TypeScript

Best Practices