Semantic Conventions

This document describes how Agenta applies domain-specific OpenTelemetry conventions to capture and analyze traces from LLM applications.

How It Works

Agenta accepts any span that follows the OpenTelemetry specification. To unlock LLM-specific features—such as nicely formatted chat messages, per-request cost and latency, and links to prompt configurations or evaluators—we add attributes under the ag. namespace.

info

The OTLP endpoint accepts batches up to 5 MB by default (after decompression). Larger requests return an HTTP 413 status code.

We support two primary instrumentation approaches:

Agenta SDK: When using our SDK with decorators like @ag.instrument, we automatically handle the semantic conventions
Auto-instrumentation libraries: We provide adapters that transform data from standard auto-instrumentation libraries (which follow their own conventions) into Agenta's format

Agenta Namespace

All Agenta-specific attributes are organized under the ag namespace to avoid conflicts with other OpenTelemetry conventions.

Core Semantic Conventions

ag.data

The ag.data namespace contains the core execution data for each span:

ag.data.inputs: Input parameters for the span
ag.data.outputs: Output results from the span
ag.data.internals: Internal variables and intermediate values

Data Format

Inputs and Outputs: Stored as JSON format. For chat applications using LLM spans, these are formatted as a list of messages:

{
  "data": {
    "inputs": {
      "prompt": [
        { "role": "system", "content": "System instruction" },
        { "role": "user",   "content": "User query" }
      ],
      "functions": [...],
      "tools": [...]
    },
    "outputs": {
      "completion": [
        { "role": "assistant", "content": "Assistant response" }
      ]
    }
  }
}

Internals: User-provided internal information such as context variables, intermediate calculations, or evaluation data that aren't part of the primary inputs/outputs. These are set by the user in the SDK using ag.tracing.store_internals().

SDK Integration

When using the @ag.instrument decorator in Python:

@ag.instrument
def my_function(input_param):
    # Function inputs and outputs are automatically captured
    # unless explicitly masked
    return result

The decorator automatically captures function inputs and outputs in ag.data.inputs and ag.data.outputs unless you choose to mask sensitive data.

ag.meta

The ag.meta namespace stores metadata about the span execution:

{
  "meta": {
    "system": "openai",
    "request": {
      "base_url": "https://api.openai.com/v1",
      "endpoint": "/chat/completions",
      "headers": {...},
      "streaming": false,
      "model": "gpt-4",
      "max_tokens": 1000,
      "temperature": 0.7,
      "top_p": 0.9,
      "top_k": 50
    },
    "response": {
      "model": "gpt-4-0613"
    },
    "configuration": {
      "prompt": {
        "input_keys": ["country"],
        "messages": [...],
        "template_format": "curly",
        "llm_config": {
          "model": "mistral/mistral-tiny"
        }
      }
    }
  }
}

Standard Metadata

info

Auto-instrumentation maps common semantic-convention keys—e.g. gen_ai.system, gen_ai.request.*—to the structure above.

ag.meta.system: Identifies the LLM provider or system being used (e.g., "openai", "anthropic")
ag.meta.request.base_url: The base URL for the API request
ag.meta.request.endpoint: The specific API endpoint called
ag.meta.request.headers: HTTP headers sent with the request
ag.meta.request.streaming: Boolean indicating if streaming mode was used
ag.meta.request.model: The model name requested
ag.meta.request.max_tokens: Maximum token limit for the response
ag.meta.request.temperature: Sampling temperature parameter
ag.meta.request.top_p: Nucleus sampling parameter
ag.meta.request.top_k: Top-k sampling parameter

Metadata is displayed in the observability overview page as contextual information to help navigate and understand span execution.

References

Use the top level references array to link spans to Agenta entities. Every entry represents one relationship and includes:

attributes.key: the reference category (for example application, evaluator_variant)
id, slug, or version: supply whichever identifiers you have; you can include more than one field if available

Example payload:

{
  "references": [
    {"id": "019a0159-82d3-7760-9868-4f8c7da8e9c0", "attributes": {"key": "application"}},
    {"slug": "production", "attributes": {"key": "environment"}},
    {"id": "019a0159-82d3-7760-9868-4f8c7da8e9c1", "version": "4", "attributes": {"key": "application_variant"}}
  ]
}

Supported categories:

application, application_variant, application_revision
environment, environment_variant, environment_revision
evaluator, evaluator_variant, evaluator_revision
testset, testset_variant, testset_revision, testcase
query, query_variant, query_revision
workflow, workflow_variant, workflow_revision

Consumers (UI, analytics, filtering) read from this array. Instrumentation libraries that cannot emit the array may still set the attribute form (ag.references.<category>.<field>); the ingestion service converts that dictionary into the same array before storage.

warning

The legacy ag.refs.* namespace is deprecated and will be removed after existing SDKs migrate. Do not rely on it.

ag.metrics

The ag.metrics namespace tracks performance, cost, and error metrics:

{
  "metrics": {
    "costs": {
      "cumulative": {
        "total": 0.0070902,
        "prompt": 0.00355,
        "completion": 0.00354
      },
      "incremental": {
        "total": 0.0070902
      }
    },
    "tokens": {
      "cumulative": {
        "total": 992,
        "prompt": 175,
        "completion": 817
      },
      "incremental": {
        "total": 992,
        "prompt": 175,
        "completion": 817
      }
    },
    "duration": {
      "cumulative": 19889.343
    },
    "errors": {}
  }
}

Aggregation Types

Metrics are tracked at two levels:

incremental: Metrics for this span only (excluding child spans)
cumulative: Metrics for this span plus all child spans aggregated together

This dual tracking allows you to see both the cost of individual operations and the total cost of complex workflows.

Metric Categories

Costs

Tracks LLM API costs in USD with the following breakdown:

Cumulative (this span + children):

ag.metrics.costs.cumulative.total: Total cost across all LLM calls in this span and its children
ag.metrics.costs.cumulative.prompt: Cost attributed to input tokens
ag.metrics.costs.cumulative.completion: Cost attributed to output/completion tokens

Incremental (this span only):

ag.metrics.costs.incremental.total: Cost for this span's operations only
ag.metrics.costs.incremental.prompt: Prompt cost for this span only
ag.metrics.costs.incremental.completion: Completion cost for this span only

info

Cost calculation uses the latest pricing for each model provider. Costs are automatically calculated when using standard LLM integrations. Cumulative metrics are automatically calculated by the backend by aggregating incremental values.

Tokens

Tracks token usage at both aggregation levels:

Cumulative:

ag.metrics.tokens.cumulative.total: Total tokens across all operations
ag.metrics.tokens.cumulative.prompt: Input tokens across all operations
ag.metrics.tokens.cumulative.completion: Output tokens across all operations

Incremental:

ag.metrics.tokens.incremental.total: Tokens for this span only
ag.metrics.tokens.incremental.prompt: Input tokens for this span only
ag.metrics.tokens.incremental.completion: Output tokens for this span only

Duration

Tracks execution time in milliseconds:

ag.metrics.duration.cumulative: Total execution time including all child spans

Additional Agenta Attributes

ag.type

The ag.type namespace contains type information about the span:

ag.type.node can be workflow, task, tool, embedding, query, completion, chat, rerank.

ag.tags

The ag.tags namespace allows you to add custom tags to spans for categorization and filtering:

# Add custom tags using the SDK
ag.add_tags({"key":"value"})

ag.version

Tracks version information for the instrumented application or component.

ag.flags

Contains system-specific flags and metadata used internally by Agenta for span processing and display.

note

When using auto-instrumentation libraries, most attributes are saved twice - once in their original format and once processed under the ag namespace

Standard OpenTelemetry Attributes

In addition to Agenta-specific conventions, traces include standard OpenTelemetry attributes:

Links: Relationships between spans
Events: Timestamped events within spans
Version: OpenTelemetry version information
Status Code: Span completion status
Start Time: Span initiation timestamp
Span Name: Human-readable span identifier
Span Kind: Type of span (server, client, internal, etc.)

Next steps

Learn about distributed tracing
Explore Python SDK tracing for easier instrumentation
See integration guides for specific frameworks

How It Works​

Agenta Namespace​

Core Semantic Conventions​

ag.data​

Data Format​

SDK Integration​

ag.meta​

Standard Metadata​

References​

ag.metrics​

Aggregation Types​

Metric Categories​

Costs​

Tokens​

Duration​

Additional Agenta Attributes​

ag.type​

ag.tags​

ag.version​

ag.flags​

Standard OpenTelemetry Attributes​

Next steps​

How It Works

Agenta Namespace

Core Semantic Conventions

ag.data

Data Format

SDK Integration

ag.meta

Standard Metadata

References

ag.metrics

Aggregation Types

Metric Categories

Costs

Tokens

Duration

Additional Agenta Attributes

ag.type

ag.tags

ag.version

ag.flags

Standard OpenTelemetry Attributes

Next steps