Semantic Conventions
This document describes how Agenta applies domain-specific OpenTelemetry conventions to capture and analyze traces from LLM applications.
How It Works
Agenta accepts any span that follows the OpenTelemetry specification. To unlock LLM-specific features—such as nicely formatted chat messages, per-request cost and latency, and links to prompt configurations or evaluators—we add attributes under the ag. namespace.
The OTLP endpoint accepts batches up to 5 MB by default (after decompression). Larger requests return an HTTP 413 status code.
We support two primary instrumentation approaches:
- Agenta SDK: When using our SDK with decorators like
@ag.instrument, we automatically handle the semantic conventions - Auto-instrumentation libraries: We provide adapters that transform data from standard auto-instrumentation libraries (which follow their own conventions) into Agenta's format
Agenta Namespace
All Agenta-specific attributes are organized under the ag namespace to avoid conflicts with other OpenTelemetry conventions.
Core Semantic Conventions
ag.data
The ag.data namespace contains the core execution data for each span:
ag.data.inputs: Input parameters for the spanag.data.outputs: Output results from the spanag.data.internals: Internal variables and intermediate values
Data Format
Inputs and Outputs: Stored as JSON format. For chat applications using LLM spans, these are formatted as a list of messages:
{
"data": {
"inputs": {
"prompt": [
{ "role": "system", "content": "System instruction" },
{ "role": "user", "content": "User query" }
],
"functions": [...],
"tools": [...]
},
"outputs": {
"completion": [
{ "role": "assistant", "content": "Assistant response" }
]
}
}
}
Internals: User-provided internal information such as context variables, intermediate calculations, or evaluation data that aren't part of the primary inputs/outputs. These are set by the user in the SDK using ag.tracing.store_internals().
SDK Integration
When using the @ag.instrument decorator in Python:
@ag.instrument
def my_function(input_param):
# Function inputs and outputs are automatically captured
# unless explicitly masked
return result
The decorator automatically captures function inputs and outputs in ag.data.inputs and ag.data.outputs unless you choose to mask sensitive data.
ag.meta
The ag.meta namespace stores metadata about the span execution:
{
"meta": {
"system": "openai",
"request": {
"base_url": "https://api.openai.com/v1",
"endpoint": "/chat/completions",
"headers": {...},
"streaming": false,
"model": "gpt-4",
"max_tokens": 1000,
"temperature": 0.7,
"top_p": 0.9,
"top_k": 50
},
"response": {
"model": "gpt-4-0613"
},
"configuration": {
"prompt": {
"input_keys": ["country"],
"messages": [...],
"template_format": "curly",
"llm_config": {
"model": "mistral/mistral-tiny"
}
}
}
}
}
Standard Metadata
Auto-instrumentation maps common semantic-convention keys—e.g. gen_ai.system, gen_ai.request.*—to the structure above.
ag.meta.system: Identifies the LLM provider or system being used (e.g., "openai", "anthropic")ag.meta.request.base_url: The base URL for the API requestag.meta.request.endpoint: The specific API endpoint calledag.meta.request.headers: HTTP headers sent with the requestag.meta.request.streaming: Boolean indicating if streaming mode was usedag.meta.request.model: The model name requestedag.meta.request.max_tokens: Maximum token limit for the responseag.meta.request.temperature: Sampling temperature parameterag.meta.request.top_p: Nucleus sampling parameterag.meta.request.top_k: Top-k sampling parameter
Metadata is displayed in the observability overview page as contextual information to help navigate and understand span execution.
References
Use the top level references array to link spans to Agenta entities. Every entry represents one relationship and includes:
attributes.key: the reference category (for exampleapplication,evaluator_variant)id,slug, orversion: supply whichever identifiers you have; you can include more than one field if available
Example payload:
{
"references": [
{"id": "019a0159-82d3-7760-9868-4f8c7da8e9c0", "attributes": {"key": "application"}},
{"slug": "production", "attributes": {"key": "environment"}},
{"id": "019a0159-82d3-7760-9868-4f8c7da8e9c1", "version": "4", "attributes": {"key": "application_variant"}}
]
}
Supported categories:
- application, application_variant, application_revision
- environment, environment_variant, environment_revision
- evaluator, evaluator_variant, evaluator_revision
- testset, testset_variant, testset_revision, testcase
- query, query_variant, query_revision
- workflow, workflow_variant, workflow_revision
Consumers (UI, analytics, filtering) read from this array. Instrumentation libraries that cannot emit the array may still set the attribute form (ag.references.<category>.<field>); the ingestion service converts that dictionary into the same array before storage.
The legacy ag.refs.* namespace is deprecated and will be removed after existing SDKs migrate. Do not rely on it.
ag.metrics
The ag.metrics namespace tracks performance, cost, and error metrics:
{
"metrics": {
"costs": {
"cumulative": {
"total": 0.0070902,
"prompt": 0.00355,
"completion": 0.00354
},
"incremental": {
"total": 0.0070902
}
},
"tokens": {
"cumulative": {
"total": 992,
"prompt": 175,
"completion": 817
},
"incremental": {
"total": 992,
"prompt": 175,
"completion": 817
}
},
"duration": {
"cumulative": 19889.343
},
"errors": {}
}
}
Aggregation Types
Metrics are tracked at two levels:
incremental: Metrics for this span only (excluding child spans)cumulative: Metrics for this span plus all child spans aggregated together
This dual tracking allows you to see both the cost of individual operations and the total cost of complex workflows.
Metric Categories
Costs
Tracks LLM API costs in USD with the following breakdown:
Cumulative (this span + children):
ag.metrics.costs.cumulative.total: Total cost across all LLM calls in this span and its childrenag.metrics.costs.cumulative.prompt: Cost attributed to input tokensag.metrics.costs.cumulative.completion: Cost attributed to output/completion tokens
Incremental (this span only):
ag.metrics.costs.incremental.total: Cost for this span's operations onlyag.metrics.costs.incremental.prompt: Prompt cost for this span onlyag.metrics.costs.incremental.completion: Completion cost for this span only
Cost calculation uses the latest pricing for each model provider. Costs are automatically calculated when using standard LLM integrations. Cumulative metrics are automatically calculated by the backend by aggregating incremental values.
Tokens
Tracks token usage at both aggregation levels:
Cumulative:
ag.metrics.tokens.cumulative.total: Total tokens across all operationsag.metrics.tokens.cumulative.prompt: Input tokens across all operationsag.metrics.tokens.cumulative.completion: Output tokens across all operations
Incremental:
ag.metrics.tokens.incremental.total: Tokens for this span onlyag.metrics.tokens.incremental.prompt: Input tokens for this span onlyag.metrics.tokens.incremental.completion: Output tokens for this span only
Duration
Tracks execution time in milliseconds:
ag.metrics.duration.cumulative: Total execution time including all child spans
Additional Agenta Attributes
ag.type
The ag.type namespace contains type information about the span:
ag.type.node can be workflow, task, tool, embedding, query, completion, chat, rerank.
ag.tags
The ag.tags namespace allows you to add custom tags to spans for categorization and filtering:
# Add custom tags using the SDK
ag.add_tags({"key":"value"})
ag.version
Tracks version information for the instrumented application or component.
ag.flags
Contains system-specific flags and metadata used internally by Agenta for span processing and display.
When using auto-instrumentation libraries, most attributes are saved twice - once in their original format and once processed under the ag namespace
Standard OpenTelemetry Attributes
In addition to Agenta-specific conventions, traces include standard OpenTelemetry attributes:
- Links: Relationships between spans
- Events: Timestamped events within spans
- Version: OpenTelemetry version information
- Status Code: Span completion status
- Start Time: Span initiation timestamp
- Span Name: Human-readable span identifier
- Span Kind: Type of span (server, client, internal, etc.)
Next steps
- Learn about distributed tracing
- Explore Python SDK tracing for easier instrumentation
- See integration guides for specific frameworks