/dq/media/media_files/2025/02/19/2LYOwHilmSazEQ2APuf5.png)
Photograph: (META AI)
All eyes are on affordable AI in 2025, and Deepseek’s R1 model marks the beginning of this trend. With no moat around AI, and the need to drive tangible value from AI investments, there’s renewed focus on making the cost of inference cheaper, more accessible, distributable, and sustainable.
With newer models being launched and businesses rushing to leverage their potential, challenges abound—especially when it comes to monitoring the performance and reliability of AI systems, and driving environmentally sustainable operational efficiencies. This is what will make intelligent observability even more mission critical in 2025, as it enables businesses to proactively maintain uptime of AI systems, and drive energy efficiencies to reduce their carbon footprint.
Preventive observability is at the heart of AI systems
AI models are highly dynamic, relying on large data sets with different types of data. Organizations also rely on a combination of AI models–they may use GPT-4, or Gemini, along with small language models. Traditional tools designed to monitor specific aspects of the infrastructure aren’t equipped to tackle complex AI systems, and the data pipelines that come with their usage.
AI evolves continuously, learning from every token, however traditional monitoring tools are not equipped to analyze and perform predictive analysis on this evolution.
For example, a machine learning model is being used by a manufacturer to predict when machinery is likely to fail based on data from sensors, historical performance and environmental conditions. The model may perform well initially and even accurately predict failures and schedule maintenance before the machine breaks down. Over time, if the machinery undergoes changes in design, environmental conditions evolve, and new wear and tear emerges, the model may begin to underperform. It may not predict failures effectively, which could lead to unplanned downtime.
Model drift or model decay—where the performance of the model or the application enabling it like an AI Agent—poses a real risk to operations. Intelligent observability is essential to monitoring both the performance of the model and the applications built on it to provide valuable insights needed to effectively monitor AI systems and ensure quality over time. It’s also predictive.
This is especially important in AI environments, where complex interactions involving probabilistic methods can lead to unpredictable variations in outcomes. Intelligent observability can predict performance bottlenecks, allowing businesses to optimize infrastructure before issues arise, and anticipate error rates, ensuring they are fixed before it impacts end users. For example, during periods of high market volatility or significant economic events, online trading platforms or investment apps might experience delays in processing real-time data, be it for asset pricing, risk assessment, or trade execution.
Preventive observability enables these businesses to detect potential bottlenecks or performance issues early on, ensuring they can scale their infrastructure and adjust resources well in advance. Such proactive monitoring prevents missed opportunities when the market experiences sudden shifts.
Sustainable AI usage with intelligent observability
Training LLMs require enormous computational power and vast amounts of energy, which naturally raise concerns about environmental sustainability, especially when the energy comes from non-renewable sources. Additionally, the physical infrastructure supporting AI, such as servers and data centers, their manufacturing, transportation, and disposal of electronic waste, all add up to an increased carbon footprint.
According to the International Energy Agency, electricity consumption from data centers, artificial intelligence (AI), and cryptocurrency could double by 2026. After globally consuming an estimated 460 terawatt-hours (TWh) in 2022, the total electricity consumption of data centers and AI usage could reach more than 1,000 TWh in 2026. This is more or less the total amount of energy Japan consumes annually.
Sustainable practices are important for climate action initiatives and in meeting net zero targets, and in 2025, sustainable IT will take center stage. Observability is a valuable tool for technological sustainability efforts as it gives businesses a full picture of the number of machines and resources that are running applications and processes, and pinpoints where resources are overutilized. For example, intelligent observability platforms highlight overprovisioned cloud resources that run AI-powered apps, and suggest where to cut back–driving both cost and energy efficiencies.
Reducing this number cuts down on the amount of carbon emitted, and ensures organizations rightsize the number of services, including memory, storage, and other resources.
Sustainable IT is also achieved when software applications are designed with energy efficiency in mind. Intelligent observability is essential to this process, as proxy metrics can be used to determine which computer, network, and storage resources have been provisioned in the workload, and only provision the resources needed for optimal performance. It can also help businesses create KPIs that measure how much energy each application or process is using compared to the business outcomes it’s achieving.
For example, an ecommerce business can calculate the amount of energy your system uses for each transaction made.
Once this is calculated, the business can compare this figure to ask the right questions: How does the business’ energy usage compare with vendors and third parties? Are they using different architectures that are more efficient? Is the business following industry best practices for energy-efficient design? By comparing the business API’s energy usage to others, it becomes easy to spot areas of improvement.
Establishing intelligent and sustainable observability practices that focus on prevention will be essential in 2025 and for long-term success, ensuring AI systems operate efficiently, sustainably, and with minimal negative impact on the planet.
-By Ved Antani, Senior Vice President of Engineering and Managing Director, India at New Relic