Skip to main content

Event Data Format

TrainLoop SDKs collect LLM interaction data in a standardized JSONL format.

Overview​

Event data is stored as newline-delimited JSON (JSONL) files in the data/events/ folder. Each line represents a single LLM interaction.

File Structure​

data/
├── events/
│ ├── 2024-01-15.jsonl # Events from January 15, 2024
│ ├── 2024-01-16.jsonl # Events from January 16, 2024
│ └── ...

Event Schema​

Basic Event Structure​

{
"timestamp": "2024-01-15T14:30:25.123Z",
"input": { ... },
"output": { ... },
"metadata": { ... }
}

Complete Event Example​

{
"timestamp": "2024-01-15T14:30:25.123Z",
"input": {
"model": "gpt-4o-mini",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What is the capital of France?"
}
],
"temperature": 0.7,
"max_tokens": 1000,
"top_p": 1.0,
"frequency_penalty": 0.0,
"presence_penalty": 0.0
},
"output": {
"content": "The capital of France is Paris.",
"role": "assistant",
"usage": {
"prompt_tokens": 25,
"completion_tokens": 8,
"total_tokens": 33
},
"finish_reason": "stop"
},
"metadata": {
"provider": "openai",
"model": "gpt-4o-mini",
"tags": ["qa", "geography"],
"duration_ms": 1250,
"request_id": "req_abc123",
"response_id": "resp_xyz789"
}
}

Field Descriptions​

Root Fields​

FieldTypeDescription
timestampstringISO 8601 timestamp when the request was made
inputobjectLLM request parameters
outputobjectLLM response data
metadataobjectAdditional tracking information

Input Fields​

FieldTypeDescription
modelstringLLM model name (e.g., "gpt-4o-mini")
messagesarrayChat completion messages
temperaturenumberSampling temperature (0-2)
max_tokensnumberMaximum tokens to generate
top_pnumberNucleus sampling parameter
frequency_penaltynumberFrequency penalty (-2 to 2)
presence_penaltynumberPresence penalty (-2 to 2)
stoparrayStop sequences
streambooleanWhether response was streamed

Output Fields​

FieldTypeDescription
contentstringGenerated text content
rolestringResponse role (usually "assistant")
usageobjectToken usage information
finish_reasonstringWhy generation stopped
tool_callsarrayFunction/tool calls made

Usage Fields​

FieldTypeDescription
prompt_tokensnumberTokens in the prompt
completion_tokensnumberTokens in the completion
total_tokensnumberTotal tokens used

Metadata Fields​

FieldTypeDescription
providerstringLLM provider (openai, anthropic, etc.)
modelstringModel name
tagsarrayCustom tags for filtering
duration_msnumberRequest duration in milliseconds
request_idstringUnique request identifier
response_idstringUnique response identifier
errorstringError message if request failed

Provider-Specific Examples​

OpenAI​

{
"timestamp": "2024-01-15T14:30:25.123Z",
"input": {
"model": "gpt-4o-mini",
"messages": [
{"role": "user", "content": "Hello!"}
],
"temperature": 0.7,
"max_tokens": 1000
},
"output": {
"content": "Hello! How can I help you today?",
"role": "assistant",
"usage": {
"prompt_tokens": 10,
"completion_tokens": 8,
"total_tokens": 18
},
"finish_reason": "stop"
},
"metadata": {
"provider": "openai",
"model": "gpt-4o-mini",
"tags": ["greeting"],
"duration_ms": 1250
}
}

Anthropic​

{
"timestamp": "2024-01-15T14:30:25.123Z",
"input": {
"model": "claude-3-haiku-20240307",
"messages": [
{"role": "user", "content": "Hello!"}
],
"max_tokens": 1000,
"temperature": 0.7
},
"output": {
"content": "Hello! How can I assist you today?",
"role": "assistant",
"usage": {
"input_tokens": 10,
"output_tokens": 8,
"total_tokens": 18
},
"stop_reason": "end_turn"
},
"metadata": {
"provider": "anthropic",
"model": "claude-3-haiku-20240307",
"tags": ["greeting"],
"duration_ms": 1100
}
}

Error Events​

When LLM calls fail, error information is captured:

{
"timestamp": "2024-01-15T14:30:25.123Z",
"input": {
"model": "gpt-4o-mini",
"messages": [
{"role": "user", "content": "Hello!"}
]
},
"output": null,
"metadata": {
"provider": "openai",
"model": "gpt-4o-mini",
"tags": ["greeting"],
"duration_ms": 5000,
"error": "Rate limit exceeded"
}
}

Streaming Events​

For streaming responses, events are captured when the stream completes:

{
"timestamp": "2024-01-15T14:30:25.123Z",
"input": {
"model": "gpt-4o-mini",
"messages": [
{"role": "user", "content": "Tell me a story"}
],
"stream": true
},
"output": {
"content": "Once upon a time...",
"role": "assistant",
"usage": {
"prompt_tokens": 15,
"completion_tokens": 250,
"total_tokens": 265
},
"finish_reason": "stop"
},
"metadata": {
"provider": "openai",
"model": "gpt-4o-mini",
"tags": ["story"],
"duration_ms": 8500,
"stream": true
}
}

Custom Tags​

Events can be tagged for targeted evaluation:

{
"timestamp": "2024-01-15T14:30:25.123Z",
"input": { ... },
"output": { ... },
"metadata": {
"provider": "openai",
"model": "gpt-4o-mini",
"tags": [
"customer-support",
"priority-high",
"v1.0"
],
"duration_ms": 1250
}
}

Reading Event Data​

Command Line​

# View recent events
tail -f data/events/$(date +%Y-%m-%d).jsonl

# Count events
wc -l data/events/*.jsonl

# Filter by tag
grep '"customer-support"' data/events/*.jsonl

Python​

import json
from pathlib import Path

def read_events(data_folder):
events = []
events_dir = Path(data_folder) / "events"

for event_file in events_dir.glob("*.jsonl"):
with open(event_file) as f:
for line in f:
events.append(json.loads(line))

return events

# Usage
events = read_events("trainloop/data")
print(f"Found {len(events)} events")

SQL (via DuckDB)​

-- Query events in Studio UI
SELECT
timestamp,
metadata.provider,
metadata.model,
metadata.tags,
input.model,
output.content
FROM 'data/events/*.jsonl'
WHERE 'customer-support' = ANY(metadata.tags)
ORDER BY timestamp DESC
LIMIT 10;

Best Practices​

  1. Tagging: Use meaningful tags for targeted evaluation
  2. Retention: Regularly clean up old event files
  3. Monitoring: Track event volume and storage usage
  4. Security: Avoid logging sensitive information
  5. Compression: Compress old event files to save space

See Also​