data as code

Data as code: An opportunity to create autonomous data management

Infrastructure as Code (IaC) simplifies developing, operating, and maintaining applications, but there is an obstacle lurking – data. Today, there is no standard for data management in an IaC environment, but we will describe three commonly used data management models, so you can apply the right model to your IaC application. More importantly, you can lay the foundation for the next generation of data management – Data as Code – to unlock the full potential of Infrastructure as Code.

What is Infrastructure as Code?

IaC brings source control to your compute infrastructure. Teams use tools such as AWS Cloud Formation or Terraform templates to encapsulate their environment – servers, networking configuration, etc. They then check in those templates, so that anybody can deploy their application stack anywhere. The abstraction of the application environment simplifies distributed development, testing, and disaster recovery. IaCode makes DevOps a reality. 

Why is data the linchpin of IaC DevOps?

Applications depend on data. Whether it is recording a transaction, analysing trends, or designing a medical device, applications create, store, and retrieve data. Therefore, it is critical to protect and secure that data from errors as well as internal and external threats. 

DevOps needs data protection and security more than traditional development approaches. 

  1. Increased security threats –  Cyber attackers target development accounts since they are less protected than production environments. In a DevOps environment, however, it is easy to move from a development account to production. Therefore, the entire pipeline must be secured.
  2. Rapid testing – Once quality assurance and performance teams can bring up an application environment near-instantly, generating a realistic test data set becomes the bottleneck. They need swift access to at-scale data sets.
  3. Reproducibility – Developers expect to recreate the environment, including data, to debug issues. Meanwhile, regulators expect teams to reproduce AI and machine learning results with the historical models and the training data. 

DevOps reduces the dependence on infrastructure, but increases the reliance on data. 

Current IaC data management models

Today, we see three models for managing data in IaC environments. 

  1. Deep analytics – store data in the container. For static analytics, this “all in one” approach leverages IaC management. Unfortunately, treating containers like VMs makes them difficult and expensive to distribute and preserve. 
  2. Cloud-native applications – external storage. Many new applications store persistent data in an external object store or managed database. This siloed approach leverages existing data management, but complicates coordination between the IaC and data silos.
  3. Modernised applications – Kubernetes Container Storage Interface (CSI) volumes. When containerising applications, dynamically attach external storage via the Kubernetes CSI. It is more streamlined than storing the data in the container, but avoids the data silos of the second approach. 

The complexity of selecting and managing the different models leads to chaos, expense, and errors. 

The future – Data as Code

Data as Code is data management that is as flexible, agile and simple as Infrastructure as Code. 

Data as Code will offer:

  1. Instant data access – All versions of all datasets will be accessible programmatically. 
  2. Global accessibility – Near-instant dataset access across regions to enable distributed development, scalability, and disaster recovery. 
  3. Ability to automatically meet regulations – Datasets packaged with metadata will help manage data access, residency, and privacy.

While Data as Code is a future state, it is critical to start laying the foundation:

  1. Multi-cloud data management pane. No data silos. You will need one central data copy hub, whether the original dataset was born on-premises, in the cloud, or in SaaS applications. 
  2. Cloud data efficiency. No regional silos. Global data access requires storage and network efficiency: global deduplication, on-demand data access, and high-performance cloud networking. 
  3. Modern data operations support. No functionality silos. This covers all data management challenges: protection, security, governance, and accessibility. 


IaC can rapidly accelerate your business and make DevOps a reality, but you need to plan for data. As you protect the data in your IaC environment, lay the groundwork for the future. Data as Code is an opportunity to create autonomous data management. Today though, you can start the search for a solution that will work across clouds with global efficiency,  manages your data protection, and enhances resiliency and security for you. Take the first step to the future today.

The article has been written by Stephen Manley, CTO, Druva

Leave a Reply

Your email address will not be published. Required fields are marked *