AI can't act as a replacement of physical manpower in DCs : Rohan Sheth, Yotta

Rohan delves into the profound impact of AI and edge computing on data centres, highlighting the shift towards higher power and cooling densities. He explains how AI-driven predictive maintenance and cooling optimisation enhance efficiency and uptime.

Punam Singh

23 Jun 2025 11:53 IST

New Update

Listen to this article

0.75x1x1.5x

00:00/ 00:00

In a recent insightful interaction, we sat down with Rohan Sheth, Head of Colocation, Data Centre Build and Global Expansion at Yotta Data Services, to discuss the transformative impact of AI and edge computing on the industry. Rohan shed light on the evolving role of data centres, the integration of AI in operations, and the future of cooling technologies.

The evolving role of data centres with AI and Edge Computing

When it comes to data centres, the first question that comes in is how do I design in terms of power usage and everything, in terms of the overall efficiency, power, cooling and everything? The same is the case with AI. The overall computing, training, and inferencing will only be possible if the number of data centres and the way data is stored and processed become more advanced, voluminous, and robust. This will enable faster adoption of AI.

AI's Impact on day-to-day data centre operations

It's an early stage. AI use cases are still in the training model. While data centres are upgrading their infrastructure to store AI workloads by enhancing power and cooling capabilities, AI applications from the last 5-10 years have significantly helped data centres in predictive operations.

Data centres are mission-critical facilities where downtime is unacceptable. The main USP of a good data centre is zero downtime throughout the year, which is possible through predictive maintenance. Instead of relying on manual AMCs for equipment like gensets or chillers, AI-driven procedures now predict when maintenance is due, when oil needs changing, or when equipment needs servicing. This predicts the behaviour of the equipment and gives an indication months before a potential breakdown. We see this not only in data centres but also in everyday appliances like home air conditioning and refrigerators. In a data centre with thousands of pieces of equipment, predictive maintenance, powered by AI, plays a crucial role in avoiding manual control and reliance on operations teams.

The shift towards smart data centres

Yes, we are moving towards smart data centres. Major vendors of IT and non-IT equipment have adopted AI, driven by market demand from customers and service providers. If you want to save money, you might opt for manually operated equipment, but now, basic specifications of all major IT and non-IT equipment come with AI-operated predictive control. Smarter data centres, both on the non-IT and IT sides, include these basic features where AI plays a significant role. They predict any fault that is about to happen or will happen in the future, automatically preventing equipment breakdowns and saving a lot of downtime errors.

Automation in data centre operations and management

In data centres, the major shift is in equipment, which is primarily in predictive or preventive maintenance. For other companies, the AI usage is still largely in the training phase. Since the advent of ChatGPT, many major companies have started training their workloads and models on GPUs across data centres. These trainings are currently in progress.

Actual enterprise usage or direct public use, where people pay for and use AI in their companies or personal lives, has not fully begun. While free applications for image generation and voice notes are available, enterprise usage, such as a pharma company using AI for drug testing or a media company using AI for voice editing, will likely mature once this training phase of AI concludes. Then, enterprises will start deploying their workloads and data into data centres, leading to the actual widespread use of AI in both private and public enterprises.

Designing future-ready data centres: Post-GPU era

The major difference between a CPU-based data centre (for cloud workloads) and a GPU-based data centre (for AI workloads) is the power and cooling density per rack. For a CPU or cloud data centre, rack density ranges from 5-6 kilowatts per rack to a maximum of 15-20 kilowatts for hyperscale customers. For a GPU environment, with AI workloads, it starts from around 40-50 kilowatts and can go up to 150-200 kilowatts per rack with the latest technologies. This variation in rack density changes the entire equation.

From our experience with installing around 16,000 GPUs in our data centres over the last year and a half, conventional cooling methods do not work for a GPU environment. Advanced cooling methods like RDHX, in-row cooling, or direct liquid-to-chip cooling are necessary. The overall data centre design—including chillers, generator sets, and the power distribution from the ground floor, where basic generators and transformers are located- doesn't change much. The significant changes are at the floor level: power distribution, cooling, and increased density.

For example, our original data centre in NM1 in Panvel was designed for a floor density of around six megawatts per floor. With GPUs, the same density per floor has nearly doubled to around 10-12 megawatts per floor because the rack density has increased. This requiresflexibility in piping diameter for cold and hot water and more space for power cables. We need space for more generators and transformers on campus, as constrained space would limit our ability to enhance power and cooling infrastructure, even with customer demand for GPUs.

At the floor level, if the piping and cabling work for the entire building is already done as per the original design for a cloud or enterprise data centre, redoing it is practically not possible; it's like building a new data centre. However, if there's flexibility at the floor level, meaning above floors are still in a raw condition, then you have the flexibility to design for a 10-megawatt floor instead of a five-megawatt floor to deploy GPUs.

Liquid cooling: Cost and scalability

If you calculate the final cost per kilowatt for cooling, there isn't a major difference between normal air cooling and liquid cooling methodologies. The perception that liquid cooling is more expensive stems from comparing the total cost of cooling a 5-megawatt IT load with air cooling versus a 10-megawatt IT load with liquid cooling in the same space. On a per-kilowatt basis, whether it's conventional air cooling, immersion cooling, direct liquid-to-chip cooling, RDHX, or in-row cooling, there isn't a major difference in the capital expenditure (capex).

The other advantage of liquid cooling, especially direct liquid cooling (DLC) or immersion cooling, is a significant reduction in Power Usage Effectiveness (PUE). In air cooling, PUE is typically around 1.4 to 1.5, whereas with immersion cooling or DLC, PUE can be around 1.2. This means you are operationally much better off in the long run, and the capex cost is not significantly higher.

AI optimisation of cooling systems

AI for cooling optimisation works on similar lines to predictive maintenance. Cooling technologies, whether air-based or liquid-based, often come with sensors. We useadiabatic cooling in our Greater Noida campus, for example. Based on the outside ambient temperature, the chiller, which throws cold air or converts room temperature water to cold water at preset design temperatures, only uses as much engine or motor power as required.

In winter, when outside temperatures are low (e.g., 10-12 degrees Celsius in Noida/Delhi NCR) and the target inlet temperature for the server hall is around 17 degrees, the pump doesn't need to run at full load. It can simply take the water inside and return it to the server hall. This is adiabatic cooling, where the system senses the outside temperature, leading to power savings because the chiller pump doesn't run at optimal or full load. In the summers, when outside temperatures are high (e.g., 30-40 degrees Celsius), the pump has to run, and condensation occurs to cool the water to the required 17-18 degrees before it's sent to the server hall.

So, just like in predictive maintenance, sensors and AI tools enable predictive maintenance in cooling methodologies. Chillers, pumps, and air handling units are all equipped with AI-enabled chips or sensors that optimise the use of their internal motors and pumps, allowing forpower cost savings.

Net energy savings from AI optimisation

The benefits that AI brings in terms of cooling methods and predictive maintenance are achieved through small, centrally controlled chips. If a vendor supplies a hundred chillers across India, their software is centrally controlled, and a small chip is present in each piece of equipment. The power consumed by these small sensors for predictive maintenance or efficient cooling is insignificant compared to the overall power savings they bring to the data centre. The advantage in terms of energy savings from these AI-enabled equipment far outweighs the additional, minor power they consume.

Evolving role of human expertise in DC management

However much AI becomes a part of overall operations management, especially in data centres, it does not act as a replacement for physical manpower or the required operations expertise. There are multiple Standard Operating Procedures (SOPs), Emergency Operating Procedures (EOPs), and customer-centric work that still requires human intervention. From a safety and security point of view, there are access control, Building Management Systems (BMS), and now with AI, various cameras and predictive systems.

However, the operations team—the L1, L2, L3 managers who are present 24/7 in data centres, are essential for the eventuality. Ninety percent of our design, construction, and operations are geared towards preventing or responding to that "once in a lifetime" or "once in every two or five years" blackout, equipment failure, or fire. In such eventualities, customer data must run uninterrupted. Humans in the operations team are crucial for rapid response actions.

The change for the operations team is that their learning and training will evolve. It will no longer be restricted to just manual procedures like changing a generator set or diffusing chiller equipment. With the advent of predictive AI and predictive maintenance techniques, the narrative has shifted. Customers now expect more; equipment manuals are changing. There's a greater emphasis on AI training and preventing incidents rather than just responding to them. The mindset of the operations team is continuously adapting to the expectation that customers will not accept incidents to happen and then receive a response. Instead, the expectation is that for the entire duration their data is hosted in the data centre, incidents should be predictably interpreted and prevented through the adoption of AI-enabled equipment and tools.