Skip to main content

Semantic Conventions

This document describes how Agenta applies domain-specific OpenTelemetry conventions to capture and analyze traces from LLM applications.

How It Works

Agenta accepts any span that follows the OpenTelemetry specification. To unlock LLM-specific features—such as nicely formatted chat messages, per-request cost and latency, and links to prompt configurations or evaluators—we add attributes under the ag. namespace.

info

The OTLP endpoint accepts batches up to 5 MB by default (after decompression). Larger requests return an HTTP 413 status code.

We support two primary instrumentation approaches:

  • Agenta SDK: When using our SDK with decorators like @ag.instrument, we automatically handle the semantic conventions
  • Auto-instrumentation libraries: We provide adapters that transform data from standard auto-instrumentation libraries (which follow their own conventions) into Agenta's format

Agenta Namespace

All Agenta-specific attributes are organized under the ag namespace to avoid conflicts with other OpenTelemetry conventions.

Core Semantic Conventions

ag.data

The ag.data namespace contains the core execution data for each span:

  • ag.data.inputs: Input parameters for the span
  • ag.data.outputs: Output results from the span
  • ag.data.internals: Internal variables and intermediate values

Data Format

Inputs and Outputs: Stored as JSON format. For chat applications using LLM spans, these are formatted as a list of messages:

{
"data": {
"inputs": {
"prompt": [
{ "role": "system", "content": "System instruction" },
{ "role": "user", "content": "User query" }
],
"functions": [...],
"tools": [...]
},
"outputs": {
"completion": [
{ "role": "assistant", "content": "Assistant response" }
]
}
}
}

Internals: User-provided internal information such as context variables, intermediate calculations, or evaluation data that aren't part of the primary inputs/outputs. These are set by the user in the SDK using ag.tracing.store_internals().

SDK Integration

When using the @ag.instrument decorator in Python:

@ag.instrument
def my_function(input_param):
# Function inputs and outputs are automatically captured
# unless explicitly masked
return result

The decorator automatically captures function inputs and outputs in ag.data.inputs and ag.data.outputs unless you choose to mask sensitive data.

ag.meta

The ag.meta namespace stores metadata about the span execution:

{
"meta": {
"system": "openai",
"request": {
"base_url": "https://api.openai.com/v1",
"endpoint": "/chat/completions",
"headers": {...},
"streaming": false,
"model": "gpt-4",
"max_tokens": 1000,
"temperature": 0.7,
"top_p": 0.9,
"top_k": 50
},
"response": {
"model": "gpt-4-0613"
},
"configuration": {
"prompt": {
"input_keys": ["country"],
"messages": [...],
"template_format": "curly",
"llm_config": {
"model": "mistral/mistral-tiny"
}
}
}
}
}

Standard Metadata

info

Auto-instrumentation maps common semantic-convention keys—e.g. gen_ai.system, gen_ai.request.*—to the structure above.

  • ag.meta.system: Identifies the LLM provider or system being used (e.g., "openai", "anthropic")
  • ag.meta.request.base_url: The base URL for the API request
  • ag.meta.request.endpoint: The specific API endpoint called
  • ag.meta.request.headers: HTTP headers sent with the request
  • ag.meta.request.streaming: Boolean indicating if streaming mode was used
  • ag.meta.request.model: The model name requested
  • ag.meta.request.max_tokens: Maximum token limit for the response
  • ag.meta.request.temperature: Sampling temperature parameter
  • ag.meta.request.top_p: Nucleus sampling parameter
  • ag.meta.request.top_k: Top-k sampling parameter

Metadata is displayed in the observability overview page as contextual information to help navigate and understand span execution.

References

Use the top level references array to link spans to Agenta entities. Every entry represents one relationship and includes:

  • attributes.key: the reference category (for example application, evaluator_variant)
  • id, slug, or version: supply whichever identifiers you have; you can include more than one field if available

Example payload:

{
"references": [
{"id": "019a0159-82d3-7760-9868-4f8c7da8e9c0", "attributes": {"key": "application"}},
{"slug": "production", "attributes": {"key": "environment"}},
{"id": "019a0159-82d3-7760-9868-4f8c7da8e9c1", "version": "4", "attributes": {"key": "application_variant"}}
]
}

Supported categories:

  • application, application_variant, application_revision
  • environment, environment_variant, environment_revision
  • evaluator, evaluator_variant, evaluator_revision
  • testset, testset_variant, testset_revision, testcase
  • query, query_variant, query_revision
  • workflow, workflow_variant, workflow_revision

Consumers (UI, analytics, filtering) read from this array. Instrumentation libraries that cannot emit the array may still set the attribute form (ag.references.<category>.<field>); the ingestion service converts that dictionary into the same array before storage.

warning

The legacy ag.refs.* namespace is deprecated and will be removed after existing SDKs migrate. Do not rely on it.

ag.metrics

The ag.metrics namespace tracks performance, cost, and error metrics:

{
"metrics": {
"costs": {
"cumulative": {
"total": 0.0070902,
"prompt": 0.00355,
"completion": 0.00354
},
"incremental": {
"total": 0.0070902
}
},
"tokens": {
"cumulative": {
"total": 992,
"prompt": 175,
"completion": 817
},
"incremental": {
"total": 992,
"prompt": 175,
"completion": 817
}
},
"duration": {
"cumulative": 19889.343
},
"errors": {}
}
}

Aggregation Types

Metrics are tracked at two levels:

  • incremental: Metrics for this span only (excluding child spans)
  • cumulative: Metrics for this span plus all child spans aggregated together

This dual tracking allows you to see both the cost of individual operations and the total cost of complex workflows.

Metric Categories

Costs

Tracks LLM API costs in USD with the following breakdown:

Cumulative (this span + children):

  • ag.metrics.costs.cumulative.total: Total cost across all LLM calls in this span and its children
  • ag.metrics.costs.cumulative.prompt: Cost attributed to input tokens
  • ag.metrics.costs.cumulative.completion: Cost attributed to output/completion tokens

Incremental (this span only):

  • ag.metrics.costs.incremental.total: Cost for this span's operations only
  • ag.metrics.costs.incremental.prompt: Prompt cost for this span only
  • ag.metrics.costs.incremental.completion: Completion cost for this span only
info

Cost calculation uses the latest pricing for each model provider. Costs are automatically calculated when using standard LLM integrations. Cumulative metrics are automatically calculated by the backend by aggregating incremental values.

Tokens

Tracks token usage at both aggregation levels:

Cumulative:

  • ag.metrics.tokens.cumulative.total: Total tokens across all operations
  • ag.metrics.tokens.cumulative.prompt: Input tokens across all operations
  • ag.metrics.tokens.cumulative.completion: Output tokens across all operations

Incremental:

  • ag.metrics.tokens.incremental.total: Tokens for this span only
  • ag.metrics.tokens.incremental.prompt: Input tokens for this span only
  • ag.metrics.tokens.incremental.completion: Output tokens for this span only
Duration

Tracks execution time in milliseconds:

  • ag.metrics.duration.cumulative: Total execution time including all child spans

Additional Agenta Attributes

ag.type

The ag.type namespace contains type information about the span:

ag.type.node can be workflow, task, tool, embedding, query, completion, chat, rerank.

ag.tags

The ag.tags namespace allows you to add custom tags to spans for categorization and filtering:

# Add custom tags using the SDK
ag.add_tags({"key":"value"})

ag.version

Tracks version information for the instrumented application or component.

ag.flags

Contains system-specific flags and metadata used internally by Agenta for span processing and display.

note

When using auto-instrumentation libraries, most attributes are saved twice - once in their original format and once processed under the ag namespace

Standard OpenTelemetry Attributes

In addition to Agenta-specific conventions, traces include standard OpenTelemetry attributes:

  • Links: Relationships between spans
  • Events: Timestamped events within spans
  • Version: OpenTelemetry version information
  • Status Code: Span completion status
  • Start Time: Span initiation timestamp
  • Span Name: Human-readable span identifier
  • Span Kind: Type of span (server, client, internal, etc.)

Next steps