Roadmap
What we shipped, what we are building next, and what we plan to build.
Last Shipped
Vertex AI Provider Support
10/24/2025
IntegrationPlayground
Use Google Cloud's Vertex AI models including Gemini and partner models in the playground, Model Hub, and through Gateway endpoints.
Filtering Traces by Annotation
10/14/2025
Observability
Filter and search for traces based on their annotations. Find traces with low scores or feedback quickly using the rebuilt filtering system.
New Evaluation Results Dashboard
9/26/2025
Evaluation
Completely redesigned evaluation results dashboard with performance plots, side-by-side comparison, improved test cases view, focused detail view, configuration visibility, and run naming.
Deep URL Support for Sharable Links
9/24/2025
Misc
URLs now include workspace context, making them shareable between team members. Fixed workspace bugs with page refresh and workspace selection.
Speed Improvements in the Playground
9/19/2025
Playground
We improved the speed of the playground (creation of prompts, navigation, etc.) especially with hundreds of revisions.
Markdown support
8/7/2025
PlaygroundObservability
You can view prompt and messages in markdown both in the playground and in the observability drawer.
Image Support in playground
7/29/2025
Playground
You can now upload images to the playground and use them in your prompts.
In progress
Structured Output and Multiple Outputs in LLM-as-a-Judge Evaluators
Evaluation
Use structured output formats and generate multiple output fields (explanation, confidence, suggestions, issue categories) in LLM-as-a-judge evaluators.
Jinja2 Template Support in the Playground
Playground
Add Jinja2 template support to enable conditional logic, filters, and template blocks in prompts. The prompt type will be stored in the schema, and the SDK will handle rendering.
PDF Support in the Playground
Playground
Add PDF support for models that support it (OpenAI, Gemini, etc.) through base64 encoding, URLs, or file IDs. Support extends to human evaluation for reviewing model responses on PDF inputs.
Prompt Snippets
Playground
Create reusable prompt snippets that can be referenced across multiple prompts. Reference specific versions or always use the latest version to maintain consistency across prompt variants.
Online Evaluation
Evaluation
Adding the ability to configure evaluators (llm-as-a-judge or custom) and run them automatically on new traces.
Programmatic Evaluation through the SDK
Evaluation
Until now evaluations were only available as managed by Agenta. We are now adding the ability to run evaluations programmatically through the SDK.
Date Range Filtering in Metrics Dashboard
Observability
We are adding the ability to filter traces by date range in the metrics dashboard.
Planned
Folders for Prompt Organization
Playground
Create folders and subfolders to organize prompts in the playground. Move prompts between folders and search within specific folders to structure prompt libraries.
Projects and Workspaces
Misc
Improve organization structure by adding projects and workspaces. Create projects for different products, set up workspaces for different environments or teams, and scope resources to specific workspaces.
AI-Powered Prompt Refinement in the Playground
Playground
Analyze prompts and suggest improvements based on best practices. Identify issues, propose refined versions, and allow users to accept, modify, or reject suggestions.
Open Observability Spans Directly in the Playground
PlaygroundObservability
Add a button in observability to open any chat span directly in the playground. Creates a stateless playground session pre-filled with the exact prompt, configuration, and inputs for immediate iteration.
Improving Navigation between Test Sets in the Playground
Playground
We are making it easy to use and navigate in the playground with large test sets .
Appending Single Test Cases in the Playground
Playground
Using test cases from different test sets is not possible right now in the Playground. We are adding the ability to append a single test case to a test set.
Improving Test Set View
Evaluation
We are reworking the test set view to make it easier to visualize and edit test sets.
Prompt Caching in the SDK
SDK
We are adding the ability to cache prompts in the SDK.
Test Set Versioning
Evaluation
We are adding the ability to version test sets. This is useful for correctly comparing evaluation results.
Tagging Traces, Test Sets, Evaluations and Prompts
Evaluation
We are adding the ability to tag traces, test sets, evaluations and prompts. This is useful for organizing and filtering your data.
Support for built-in LLM Tools (e.g. web search) in the Playground
Playground
We are adding the ability to use built-in LLM tools (e.g. web search) in the playground.
Feature Requests
Upvote or comment on the features you care about or request a new feature.