The unstructured Data Revolution: How businesses must adapt to survive

Amy talks about the Implications and challenges for organizations with regard to the staggering explosion in unstructured data.

08 May 2023 10:18 IST

New Update

To make the most of unstructured data, organizations need to rethink how they store, manage, and extract value from this data. A fresh approach is needed that takes into account the unique characteristics of unstructured data, such as its volume, variety, and velocity.

In today’s data-driven world, organizations are faced with an unprecedented amount of data. The rise of mobile technologies, cloud computing, machine learning, and IoT, and emerging technologies such as AI and VR-based applications are resulting in an exponential growth of unstructured data. According to Gartner, unstructured data growth rates have hit 30% per year, which means total unstructured data volumes will almost quadruple by 2027.

On the one hand, unstructured data can provide valuable insights that can help businesses innovate and adapt to changing market conditions. On the other hand, unstructured data is inherently more difficult to manage than structured data, which has traditionally been the focus of data management efforts.

Businesses that can successfully leverage unstructured data will be able to innovate and adapt to future conditions with more agility and less wasted resources.

To make the most of unstructured data, organizations need to rethink how they store, manage, and extract value from this data. Fortunately, there are advanced data storage technologies available that can help organizations manage unstructured data more effectively. For example, flash storage arrays and cloud-based storage solutions provide high-performance and scalable storage options that can accommodate the large volumes of unstructured data that organizations are dealing with.

Amy Fowler, VP and General Manager, FlashBlade, Pure Storage in conversation with Minu Sirsalewala, Executive Editor – Special Projects, Dataquest.

Amy talks about the Implications and challenges for organizations with regard to the staggering explosion in unstructured data and how organizations can leverage unstructured data for continuous innovation and successful digital transformation.

Data explosion due to digital transformation is leading to a huge growth of unstructured data. What are the data storage challenges faced by enterprises due to this growth?

There are many challenges that enterprises face with data storage. According to Gartner, just in the next three years, the unstructured data is going to grow by 3x. That’s a tremendous amount of data and by 2025 this will create a huge set of challenges for all types of organizations but especially for enterprises. These challenges range from scale: Do enterprises have the capability to continue to grow and manage their current IT teams, as opposed to needing to grow their teams in line with the amount of data. That’s pretty much impossible. So, it’s important to ensure the data they have is intuitive.

Another challenge is that the enterprises have a variety of data types. Unstructured data comprises both file and object data. How do we make it easy for enterprises to manage the different data types? For example, if they are stuck with needing to have a very discrete silo for each type of data, it will require a completely different setup and different set of orbits and feature characteristics for each data type. That’s introducing a ton of complexity into an already complex environment. However, this challenge can be solved when you have a platform that can unify both fast file and fast object data and do it in a way that delivers, what we refer to as multi-dimensional performance. A platform that can address both small and large files, whether they’re sequential or random, and knows whether their throughput is medium or

super high.

The challenge of scale can also manifest in terms of how many billions of milestone checks they have. And because this data is super valuable, in some cases, it is coming off edge devices and the enterprises want to act on it. A decade ago, fast object was kind of an oxymoron. Like, why would you need speed to insights associated with your object data. But today, that’s something that people look for, to take advantage of the simplicity of object data, they need a lot of performance to mine that data, get insights from the metadata associated with the objects.

These are the things that are at the top of mind for enterprises.

What technical storage challenges does unstructured data present and what storage technologies are required to overcome these?

We are heading into an era of flash storage technologies. Flash is becoming the go-to and is helping address the problems of scale and variety of data. Another very important set of challenges that we’re seeing organizations face today has to do with the environmental burden that comes with all this data, especially in a legacy disk-based environment especially, when you consider the power, space and cooling required to store so much data. Flash requires much less power and space and this has become a key selling point for many of our customers.

Healthcare organizations have to be able to reconcile and analyze all that data from a huge patient population and correlate that to genomic data. This can generate faster and more accurate diagnosis that can save lives. That’s the area where we’ll continue to see AI advancements.

But even if you look at the flash storage offerings in the market, Pure has been able to stand out. We have made huge investments to ensure that we are delivering the best feature functionality that gets the most out of our hardware platform design. Designing our own hardware enables us to do things that you can’t if you just rely on off the shelf hardware from flash manufacturers. What Pure has done is to create what we call direct flash modules which are significantly denser, and therefore are enablers to creating much more power efficient systems which can store these petabytes of unstructured data. At the most fundamental level, flash is going to be a key technology.

We expect to continue to see an uptick in terms of people leveraging object as a protocol associated with their unstructured data. Objects have fewer layers of complexity than traditional file systems. And that’s advantageous for a lot of applications. Flash and the growth of objects are two of the key technologies that I think are going to continue to be extremely relevant in the next few years.

How are organizations leveraging unstructured data for continuous innovation and successful digital transformation, especially in healthcare, financial and retail sectors?

The healthcare sector is one where they have to deal with a lot of unstructured data, from patients’ medical records to genomics and gene sequencing and X-rays and MRI imaging. Healthcare organizations have to be able to reconcile and analyze all that data from a huge patient population and correlate that to genomic data. This can generate faster and more accurate diagnosis that can save lives. That’s the area where we’ll continue to see AI advancements. Physicians and radiologists are critical of course, but AI can help them speed up early diagnosis. And to do that you can’t take a single patient’s medical record, you need to be able to go back and look at every image associated with lung cancer for thousands, if not millions of patients. To train the system on how to recognize it, and then correlate that to some genomic data in a specific patient case. I think we’re at the tip of the iceberg in terms of the applications for this, in improving patient care from a healthcare perspective.

In financial services there’s so much modeling that happens in terms of being able to make predictions about what’s going to happen in the markets. And that modeling must come from billions of data points that you need to be able to pull together. And then generate very close to, if not actual, real-time analyses. We’re going to continue to see the FinTech sector effectively continue to take advantage of all that unstructured data. All that historical data out there to do simulations and modeling. And, simulations really are a big utility of unstructured data, they’re used to inform simulation so that we can get into better predictability around what’s going to happen from the data that we are already collecting. It’s hard to simulate things if you don’t have the combination, if you can analyze that data real time, to analyze that huge amount of potentially billions of objects or files quickly. So, ensuring that our architectures can deliver that and have that kind of performance capability is a huge focus for Pure.

How are organizations looking to leverage the public cloud for high performance?

For a lot of customers there’s a compelling reason to do this. Let me give you an example of something that Pure has built, because there’s still a lot of cases where organizations want the CPU or stability at the compute layer. But at the same time, they have that need to correlate back to an extremely high-performance storage layer. We’ve built solutions where a customer can use a cloud compute workload with a direct connection back into a hosted data center running Pure Storage. We demonstrated that we can get the type of outcomes that they need from a performance perspective with that hybrid approach. Where the compute is still in the cloud, but all the storage on-prem, in this case, in a hosted data center. So, in simulations, such as in a chip design world, it is very hard for a public cloud architecture to necessarily deliver the SLAs required on the storage side of things while ensuring that they can do their development at the pace they want to.

Flash storage in DCs. What are the benefits of flash storage?

From an environmental perspective flash is better for Data Centers. Our customer organizations have a power envelope and a performance envelope. So, they need a certain number of gigabytes per second, per watt and that’s a tough target to hit. And that’s something we’ve been extremely focused on being able to deliver the type of performance per gigabyte inside of a fixed power envelope.

Flash is much more efficient than disk. For example, in FlashBlade today, our direct flash modules that we support are at 24 and 48 terabytes, the biggest off the shelf flash drive today is about 30 terabytes. So, we’re more than 50% more efficient than the flash drives that we are seeing from OEMs. And that’s what all our competition uses at best. We also work to ensure that our flashes are highly scalable and reliable and that we’re able to put a lot more into a smaller footprint. On average, our products deliver up to 80% less energy usage than competitive flash arrays.

Could you talk about your latest innovation in Unified Fast File and Object (UFFO) - FlashBlade//S and how it addresses the demands of unstructured data and modern application growth?

A couple of things here. The first is that we believe we need to be able to support both file and object, a myriad of customers for structured data requirements but a single consolidated platform for the sake of efficiency and ease of use. So that’s the first thing and the reason why we have unified fast file and fast object, and we were the first to bring that to market. The other from an innovation perspective approach that we have is to deliver an object gateway, whereas some organizations have done an object base stack, and then added a file gateway, while others offer file as an option. At the object gateway, we’ve done all native file and all native object on the same platform. And we do that because we don’t want our customers to have to compromise. If you do it the other way. Either way, you end up with compromises relative to just the simplicity and the scale out of the system.

What about data lakes? Are data lakes becoming a dumping ground for datasets?

Right now, we’re more focused on streaming analytics and log analytics. We haven’t seen flash be applied quite as broadly yet in data lakes but I do think that in the next few years, we’re going to see a significant change as we get closer to a total cost of ownership crossover point, between flash and disk to be able to address more workloads.

Why is the ability to not just backup data fast but to restore rapidly and at scale during a ransomware attack is critical?

I am glad you asked this, it’s been such an interesting thing. It’s especially close to me personally, having been in the storage and infrastructure sector for so many years. In the past, customers weren’t so concerned about how quickly they could restore their data. Unfortunately, as we all know, ransomware has created a scenario where people or organizations need to have the ability to not just backup their data, and have it sitting there, and then maybe extract, restore some little piece of it, but to be able to do a wholesale multi-terabyte, petabyte restore, which requires a huge amount of throughput, to feed back into target systems. The scale out architecture of FlashBlade is extremely well suited for that. We have taken it to the next level here because several years ago, we introduced the feature that we call SafeMode that takes sequence shot snapshots of every bit of data on FlashBlade.

SafeMode prevents anybody, even a rogue internal employee, from deleting the snapshot and compromising that data. Once you know you’ve got that clean set of data, you really are adding a critical additional layer of protection to your environment, relative to your ability to restore. Everyone is looking to protect themselves across multiple points in their perimeter level. However, one of the things that organizations need to be looking at, is absolutely the ability to rapidly restore a safe copy of their backup in a very short period. It’s just a completely different world. Whereas, backup and the speed of backups, was not a global concern until a few years ago. A trend that we’re seeing is that there’s more dialogue and more recognition around ability to restore quickly, is a critical business imperative.

Conclusion:

Overall, effective management of unstructured data is critical for organizations that want to innovate and adapt to future conditions with more agility and less wasted resources. By adopting a fresh approach to unstructured data management and leveraging advanced data storage technologies, organizations can turn unstructured data into a strategic asset that helps drive business growth and success.

Amy Fowler, VP and General Manager,

FlashBlade, Pure Storage

minus@cybermedia.co.in