/dq/media/media_files/2025/02/11/maxnByDgfMhhbVrj1TYA.png)
Edited and select excerpts from a detailed discussion…
US Goliath model
Manoj Payardha: You have the hyperscalers in America: AWS, Google Cloud, and Azure. No matter where the technology gets built, if it scales up tomorrow and acquires more customers, it needs more compute. Everyone starts paying more and more money to the mega giants in the United States. This led to a US boom on Nasdaq. The same thing was applied to AI-ML. The VC ecosystem in the United States came up with this idea that let’s have this open source project, which is what OpenAI is, where we essentially put resources into it. Elon Musk and a lot of other people came together and invested in this idea. Now, we have these breakthroughs that have happened where actually neural networks can give you real intelligence and we can crunch large amounts of data. They leveraged on the amount of data on the Internet and all the new algorithms that have come in. After that the success of GPT was so big that billions and billions of dollars started going into it. Then the Nvidia GPUs came in.
One thing that is not appreciated by the world and not known to many is that an absurdly large number of AI-ML researchers come from China. It’s not even close. The Chinese universities are producing large numbers of these researchers. Most of the open AI contributors are Chinese. Chinese students at home and abroad are a big part of this. However, even this talent ends up in the United States. The smartest people and best research will happen there. That’s why the best professionals of the world, including India, end up in the US.
The rival DeepSeek David
Piyush Goel: Adversity brings a lot of innovation. As the US and China tensions escalated, Chinese companies did not have access to very powerful and new age chips. Some of the scientists at DeepSeek had to go back to the drawing board and see what is fundamentally different. They then could achieve a similar level of quality and depth in the models. They used something that is called mix of experts’ architecture. That means you train several parts of your model on a very vertical or specific or nuanced dataset. In computer science there is something called a floating-point operation. That accuracy or precision of your floating-point operations is what makes a difference when you are talking about billions and billions of training data sets. ChatGPT originally used 32 points of precision or accuracy on their decimal points. What DeepSeek did was try with a lower precision set, for instance 8 points. That translates into one-fourth the size of decimal digit on every knob of your model. With that they were able to achieve similar levels of quality with a 75 per cent reduction in compute power. Then they trained their models on phrases instead of tokens. What ChatGPT does is it looks as every word as a token. It passes through every token. If you’re trawling the Internet, then it’s trillions of tokens being passed through. From there it's arriving at a weighted set of decisions of 32 decimal points each.
Reinforcement Learning Model
Manoj Payardha: The reasoning models that everyone was building had external evaluators or critics. These were human based heuristics that were loaded on top of the model to make reasoning efficient. This also became a big industry. There were actually companies which would observe how humans would reason and provide some sort of data structure to these largescale AI-ML companies. DeepSeek came up with reasoning via reinforcement learning, completely eliminating these external evaluators. Why didn’t others think of this? Because structuring it all isn’t easy. This is something like the AlphaZero paper where the model essentially learns by playing with itself on how to become a better player and within 6 hours of training became the best Go player ever without ever seeing a human game. But structuring and quantizing human reasoning was tough. DeepSeek achieved that.
Can DeepSeek be emulated?
Piyush Goel: It has actually shown to the world that you don’t need thousands and thousands of GPUs—massive clusters—to train your own AI model. DeepSeek has pretty much made it open source. A level playing field has been created. Now the idea is that with a small team of engineers you just get your hands on some cutting-edge GPUs, pick up open models like DeepSeek, and you can fine tune it and train it for your own domain specific knowledge sets. You will then have a model which is going to outperform some of these general-purpose models, if I may use that term. I don’t think the likes of ChatGPT will have a monopoly unless they have something up their sleeve. I feel the AI arena has been completely disrupted.
Where India stands
Manoj Payardha: We can be humble. Accept where we are as a nation. Invest in ground level research and parallelly realize what we best can do: Let these guys (US and China) fight it out. We don’t have to be in the middle of a war between two countries that are big elephants. Whatever benefits we get, let’s take them. Let’s leverage and build, build, build.
Piyush Goel: As a country we have to build some form of symbiotic relationship with our cutting-edge companies. We have to partner with them and give them cover from external threats. At the same time give them the right funding and support. We have to provide them a right distribution model which allows them to take their products to the Indian consumers.
Catch the complete discussion on Spotify…