How about having a storage area that is less of a dank-creepy basement and more of a colorful modular wardrobe? Better still, less of a boring museum and more of a Disneyland; less of concrete and more of Lego; or less of rigid storage and more of S3? Well, that’s exactly what a team at AWS might have aimed for. This year S3 celebrates its 15th birthday. In an interview with Dataquest, Mai-Lan Bukovec, Global Vice President, Block and Object Storage, AWS, talks about redundancy, consistency, data movement, and also how much S3 has evolved and what keeps it still young.
It’s been 15 years for Amazon Simple Storage Service (Amazon S3). What highlights stay in your mind – even today?
I can recall the first press release very vividly, even today. It was not the typical press statement. We actively put our core principles in it. The release read: “Amazon used the following principles of distributed system design to meet Amazon S3 requirements.” In fact, before writing a line of code, the team developed a set of design principles that created a backbone. These, mainly, were: Decentralisation, asynchrony, autonomy, controlled concurrency, failure tolerance, and controlled parallelism, decomposability into small, well-understood blocks, symmetry, and simplicity. Interestingly, S3 was engineered to do one thing and do it well – highly scalable, reliable, and low-cost storage for customers.
Were all of these core principles well-preserved? Did they work?
They worked. S3 was scalable, reliable, and fast – all at very low costs. Over these years, AWS has introduced numerous new storage classes and features to help customers get more value out of S3 – and also driven down the cost of storage. We started with Microservices. We captured the fundamentals of distributed, secure and cost-effective storage. We believed in these fundamentals. Fast forward to today – and these core principles are still intact and alive. When AWS architected S3, we designed it to hold 20 billion objects. With 100 trillion objects stored in S3, we still remain true to the core. That’s why our growth has been great over all these years. No matter how much elasticity you need, no matter whether you use machine learning (ML) or artificial intelligence (AI), they have to be fitted out of data strengths. That’s what S3 brings – with terabytes and exabytes of storage.
When AWS architected S3, we designed it to hold 20 billion objects. With 100 trillion objects stored in S3, we still remain true to the core.
Redundancy is the core strength for S3. How much does it matter and is there a flip side to it?
I would say that S3’s distributed, decentralised design principles have stood the test of time. From day 1, the availability and durability of our customers’ data were crucial for us. AWS began by building fault tolerance into S3. Hence, we built redundancy on top of redundancy to make it work seamlessly for customers, operating on the assumption that hardware components can, and will, fail. The goal was, and is, to make sure that you do not have a trade-off for redundancy. Customers want the security of data but no trade-offs when it comes to costs or access. It is important that scaling of architecture does not lead to over-provisioning. S3 has helped many customers with business ups and downs and with the scaling of data very easily. Some companies need deep redundancy which is not practical in a typical data center. But we ensure that our redundancy advantage is also more cost-effective than traditional storage. So, the answer is that we make sure we avoid any trade-offs.
In India, I see so much innovation that it’s exciting to watch applications that change the world. That spirit cannot blossom without data innovations.
Tell us something about consistency – the promise and the challenges around it? Does it have any conflicts with latency or availability, especially as one aims for eventual consistency?
When S3 was first launched it had an eventual consistency model. The metadata would take some time to show up in rare cases. It worked fine in most cases of backup because customers needed the data to just be there. As the needs of business changed, a lot of real-time processing of data emerged with complex analytics and ML models. When we saw this trend pick up, we started to realise that the consistency model has to be updated. Customers had started exploring solutions through their own application code. We started thinking about strong consistency but we didn’t want to make any of the cost or performance trade-offs. We set a higher bar. We figured how to build consistency with no additional compromise.
Was it easy?
Our core pillars – as mentioned in the very first press release – of performance and security had to be kept intact. We started aiming for it by making metadata cache strongly consistent, by injecting new witness components and new replication logic. Consistency also had to be executed correctly – we did not want any scope of edge cases that break consistency. We achieved this through rigorous testing and verification techniques. Now everything in S3 is strongly consistent. This is because we stayed true to our core principles. We used all our engineering expertise and evolved well in the last 15 years. The way we engineer it makes sure that there are no conflicts of latency or availability. So there is no compromise on our core principles of cost and performance today – as was intended since the first day.
How does distributed and cloud nature of storage work where compliance and regulatory constraints can limit the movement of data?
In the USA, for instance, we have many customers and align to all apt regulations. We are aware of regulatory requirements. Many customers use S3 in a successful way without any impact on compliance needs.
Moderna was able to develop their COVID-19 vaccine very quickly because they run their Drug Design Studio on AWS’s compute and storage infrastructure.
How relevant is cloud-based storage for today’s scenarios?
S3 was built as evolvable system architecture. We didn’t have preconceived notions of what existing legacy hardware or what data center infrastructure ought to look like. We weren’t constrained by inflexible on-premises appliances either. Today, we are in the world of modern data architecture. Companies that are evolving fast are building next-generation applications which need a data model so that they can take advantage of a shared model. If data is in silos, then they would not be able to tap the same level of advantages.
What role can the expansion of availability zones play in the data strategy that enterprises need today?
It is one of our core badges for S3. S3 has been architected to build three or more availability zones, and with S3 we are using a certain region’s capabilities at the best possible levels. S3 is ultimately an architecture that can withstand the loss of a data center and this resilience is strengthened with availability zones.
What are you most proud of when you look at S3’s evolution?
S3 is the foundation of the business and growth goals of many of our customers. This is visible all over the world. We are proud of the growth trajectories of these customers.
Can you share information on the Moderna project?
We are happy we contributed to the acceleration of the vaccine. Moderna Therapeutics focuses on using messenger RNA (mRNA) science to create novel medicines like the Moderna COVID vaccine. Moderna was able to develop their COVID-19 vaccine very quickly because they run their Drug Design Studio on AWS’s compute and storage infrastructure. That helped Moderna quickly design mRNA sequences for protein targets, and then use analytics and ML on their data lake on S3 to optimise those sequences for production so that the company’s automated manufacturing platform can successfully convert them into physical mRNA for testing.
Keeping in view the fast-changing technology, what advice will you give to the customers?
I would say ‘Move to Data – Now.’ That’s the next big paradigm. You may not be using ML today, but soon you will. The same trends would be seen with predictive analytics or AI. The core of all these shifts is data. When you start moving this data to S3, you are not merely sharpening and simplifying your data strategy but are also laying the ground for predictive analytics and ML. In India, I see so much innovation and entrepreneurship that it’s exciting to watch applications that change the world. That spirit cannot blossom without data innovations.
Mai-Lan Bukovec, Global Vice President, Block & Object Storage, AWS
By Pratima Harigunani