Why CIOs can’t afford to miss observability in the age of Agentic AI

New Relic’s Ved Antani on why AI-native observability is essential to govern agentic AI, prevent silent failures, and ensure enterprise-wide accountability.

author-image
Aanchal Ghatak
New Update
cios

Ved Antani, Senior VP of Engineering and MD of New Relic India

Listen to this article
0.75x1x1.5x
00:00/ 00:00

As organizations adopt and scale AI systems that are more and more capable of autonomy, organizations are at greater risk due to blind spots, silent failure and unpredictable behaviour. Today’s AI is anything but deterministic!

Consider organizations experiencing "image generators" that default to Studio Ghibli-style outputs or cases where AI agents are making unsupervised decisions involving mission-critical workflows.

There are a variety of factors that are often overlooked in today's AI systems; traditional observability tools — designed with simpler rule-based systems in mind — are not keeping up.

In an interview with Dataquest, Ved Antani, Senior Vice President of Engineering and Managing Director, New Relic India, discusses why AI-native observability is quickly becoming a must-have for CIOs and engineering leaders.

He explores how newly agentic AI will cause organizations to rethink key performance indicators (KPIs) as well as how companies like New Relic will re-tool observability platforms in ways that return transparency, traceability, and ultimately trust, to the AI stack.

"New KPIs every CIO must track in an Agentic AI world"

The “Ghibli Trend” is a curious reference — could you decode it for us, and explain how it connects to AI unpredictability and business risk?

When OpenAI recently released its latest image-generation feature, it took an unpredictable turn as Studio Ghibli-style images took the internet by storm. The AI model unexpectedly defaulted to generating Ghibli-style images even when the prompt didn’t explicitly request it. This is AI unpredictability. It exhibits a surprising bias learned from vast training data, leading to unexpected outputs. For brands, such unpredictability can introduce factual inaccuracies, brand inconsistencies, or even reputational damage.

This is exactly why observability becomes indispensable. It tells you why a particular model produced unexpected outcomes, in this case, a 'Ghibli-style’ output. Observability provides detailed traces, logs, events, and metrics, allowing brands to detect when the model behaves unpredictably, diagnose what factors or data inputs caused it, and why. This helps engineers mitigate such biases and/or add guardrails.

Agentic AI systems are no longer just theoretical — they’re starting to impact workflows. From a CIO’s lens, what makes these systems harder to monitor and govern?

Agentic AI systems introduce significant challenges to monitoring and governance compared to traditional applications or even older machine learning models. The difficulty lies in their autonomy and unpredictable nature. Unlike scripted ML models with input/output, AI agents are equipped with decision-making powers so they can break down tasks and interact with various services based on their goals and environment. Workflows are complex, branching decision trees.

Monitoring and governance require an understanding of why an AI agent took a specific action, how it interpreted a goal, which sub-tasks it carried out, and how it interacted with other systems. Traditional observability tools don’t cut it anymore. AI Agents require observability to provide detailed traces, contextual logs, and agent-specific metrics necessary to reconstruct an agent's decision-making process, understand failure points across complex interactions, and verify outcomes against governance policies.

Without such granular visibility into the internal state and execution path of an AI agent, monitoring would only be guesswork, and ensuring these systems are reliable and safe becomes incredibly difficult and risky.

What’s at stake for enterprises if observability doesn’t evolve in tandem with increasingly autonomous AI models? Can you cite any red-flag scenarios?

If observability doesn’t evolve at the speed of autonomous AI, the core risk would be the profound loss of control and visibility over critical business processes that are automated by AI agents. Autonomous AI models are self-directing and non-deterministic. If observability is static, it could raise many red flags, including:

● Unexpected and undetected costs: AI Agents may misinterpret a goal or enter a loop that could lead to expensive cloud resource or external API usage without immediately detecting the root cause.

● Process failures: Say an AI agent is automating a complex workflow and makes an incorrect decision that isn’t immediately obvious, leading to cascading errors in supply chains, customer interactions, or financial operations. This silent process failure is hard to debug quickly. For example, an AI Agent tasked with monitoring inventory levels, predicting demand, and automatically triggering orders to move goods between warehouses or from suppliers. If there’s an internal logic flaw or the agent misinterprets corrupted data from a warehouse feed, the agent decides not to replenish a critical item in a major distribution center. The failure is silent because the system doesn’t crash, but as demand for the critical product rises, the distribution center runs out of stock, delaying shipments, leading to unfulfilled orders and customer frustration.

● Compliance blindsides: Without the capacity to trace an AI agent’s decision path, it’s difficult to say why an agent accesses specific data or takes a particular decision, creating significant regulatory and security risks.

If observability doesn’t evolve to tackle autonomous AI, agents turn into black boxes, exposing the business to operational, financial, and reputational risks.

As models like ChatGPT scale across business functions, how can observability help CIOs detect blind spots — before they turn into system-wide failures?

CIOs today encounter a fundamental challenge as powerful AI models are integrated across their enterprises — how to gain visibility into performance and reliability across business applications? Unlike individual applications, agentic AI operates across diverse functions and can develop blind spots. These spots are essentially specific areas where an AI agent behaves unpredictably or fails silently. This could be triggered by some unique context, data, or user interaction pattern.

For example, a prompt may work perfectly well for the marketing team but may exhibit bias when used for internal HR responses.

Observability offers granular insights on how the model performs within each specific application and workflow. Without this detailed context, tracing requests through different business processes, capturing prompt and response pairs specific to different user groups, and monitoring performance tied to individual application functions become difficult. This means the business isn’t detecting subtle degradations or incorrect behaviors.

Observability shines a light on blind spots proactively. It instruments the model’s interaction within each application and catches anomalies. Such early detection prevents localized issues from spiralling into system-wide failures that adversely impact multiple departments or the entire customer base.

Agentic AI requires a different kind of telemetry. What new signals or KPIs should CIOs push their teams to track that go beyond traditional logs and traces?

Logs and traces are still necessary for observability, but they aren’t enough. Logs and traces tell businesses whether an AI Agent is running, but not why an AI agent made a particular decision or how it followed a particular goal. Telemetry must offer total visibility into the AI agent's behavior and decision-making.

This requires new KPIs like tracking Goal Achievement Rates, or whether the agent successfully completes its high-level task. CIOs must also track Decision Traceability or the practice of logging internal steps and external calls the agent took to reach a specific conclusion. Think of it as a flight recorder for the AI agent’s logic. Additionally, Monitoring Tool Interaction Success is vital to comprehending how reliably an agent uses external services. Cost Attribution per Task and logging any Guardrail or Policy Violations are essential for financial control and governance.

Such evolved telemetry offers much-needed context to understand where the AI agent is going wrong, optimise its performance, and ensure it works within defined boundaries. With agentic AI, new KPIs are needed because observability moves beyond a mechanism to detect system health, to understanding agent intelligence and behavior.

How is New Relic integrating AI-native observability into its platform — and what should CIOs look for in tools that claim to be ‘AI-ready’?

We integrate agentic AI capabilities directly into the platform and extend them into popular workflow tools such as ITSM/SDLC platforms like GitHub, Copilot, Amazon Q Business, ServiceNow, and Gemini Code. This makes troubleshooting, incident prediction, and resolution faster and more efficient. The agent-to-agent orchestration allows automation of complex tasks through natural language APIs.

Today, workflow tools are rapidly evolving, driven by agentic AI innovation. This adds layers of complexity to monitoring applications and digital infrastructure, making it essential to move beyond traditional observability platforms. Platforms with agentic AI integrations should become the preferred choice for businesses, especially in this era of rapid technological advancement.

CIOs can look for tools that:

● Seamlessly integrate with popular workflow platforms, enabling easy embedding into existing systems.

● Offer intelligent orchestration capabilities that can prioritize the right agent for specific tasks.

● Provide a 360-degree view of the entire digital ecosystem for better visibility and control.

● Use technologies like retrieval-augmented generation (RAG), which leverages existing knowledge assets (like runbooks or internal docs) to improve decision-making.

In short, platforms with agentic AI integrations should make your systems smarter, teams faster, and decisions more precise. The platform shouldn’t be just about automating a few tasks.

There’s growing pressure for AI transparency and auditability. How can observability support governance frameworks — especially in regulated industries?

The widespread usage of AI is elevating the importance of effective observability. Observability platforms offer comprehensive visibility into the digital ecosystem for regulated industries like healthcare, finance, and government organizations. They capture metrics, logs, and traces throughout the AI lifecycle. This granular visibility helps monitor data quality, detect anomalies, trace decision pathways, and ensure adherence to the stringent regulatory requirements of these critical sectors.

With the end-to-end visibility into critical business transactions and database operations, it also helps reduce mean time to resolution (MTTR). This is especially crucial in these critical sectors where downtime or errors can have serious consequences.

In addition, features like prediction and response intelligence are vital for early issue detection and root cause analysis, solving problems before they impact users. They also provide clear, contextual insights for faster remediation, enabling organizations to maintain transparency and accelerate risk mitigation.

Therefore, it’s safe to say that AI-powered, advanced observability is key to building robust governance frameworks, ensuring compliance, transparency, and operational excellence across regulated industries.

What role is India playing in shaping observability for next-gen AI architectures? Any breakthroughs worth highlighting?

Our innovation centres in India are a peer to our U.S. and European centers, and enable us to tap into exceptional local talent and create products for customers in India and worldwide. In Hyderabad and Bengaluru, teams are independently building products with full autonomy, and integrating them directly into the New Relic platform–this includes some of our core AI capabilities.
At present, the teams are building state of the art capabilities to take agentic AI to the next level. Watch this space!

For CIOs preparing to embed agentic AI into mission-critical systems, what are the non-negotiables in their observability strategy?

Observability is non-negotiable while embedding agentic AI into mission-critical systems. First, ensure end-to-end traceability across workflows to see the agent’s journey and impact across every system and process it touches. Second, gain deep visibility into the agent’s internal state. Understanding why an agent made a particular decision in a critical path is vital. This is done by capturing telemetry on the agent’s goal sub-tasks, internal reasoning, and interactions at the granular level. Third, obtain context for the agent’s performance and monitor for errors.

Metrics and alerts must be specifically tied to outcomes of critical business processes that any AI agent performs, so degradation and anomalies are flagged immediately. Finally, automate compliance and guardrail monitoring. Ensure observability tools monitor mission-critical systems, proactively alerting on deviations from compliance and governance policies to ensure trust and safety.  

Looking 12–18 months ahead, what’s your prediction: Will observability become a core pillar of AI strategy — or is it still being treated as an afterthought?

Businesses are operating in a multi-cloud, multi-platform world. The sheer volume of data being generated is pushing organisations beyond the bounds of what a human alone can manage. While AI can create great velocity and efficiency in these areas, it can also cause business disruptions if not monitored correctly. A lack of visibility and understanding across an organisation's AI, clouds, platforms, and data creates blindspots that can lead to IT outages, and engineering inefficiencies (over spending on cloud storage for example).

These factors can affect a businesses’ bottom line and are exactly why observability will continue to be a key factor in an organisation’s AI strategy. Businesses need a deep understanding of how their AI is performing, and observability is a core pillar that will provide engineers with unprecedented visibility and insights across the entire AI stack so they can build and run safe, secure, and robust AI applications with confidence.