Business Technologies

Cloud: New home to data warehouse

Cloud data warehouses, such as those offered by Snowflake, AWS, Microsoft Azure and Google Cloud are gobbling up market share

DQINDIA Online

23 Feb 2021 06:48 IST

New Update

COVID-19 has intensified the need for the cloud as an enabler of increasingly critical e-commerce, remote sales, and flexible cost structures. During the lockdowns, ten years of e-commerce growth took place in just a few months.

Advertisment

The global health crisis is increasing economic uncertainty, making it more important than ever for businesses to have flexible operations and channels, as well as infinitely variable technology costs.

Today it feels like everything is moving to the cloud. Even large enterprise data warehouses that many people thought would never leave the on-premise data centers are moving to the cloud to take advantage of today’s cloud technologies.

Cloud data warehouses, such as those offered by Snowflake, AWS, Microsoft Azure and Google Cloud are gobbling up market share from their on-premises competitors.

Advertisment

Migrating a data warehouse from a legacy environment requires a massive upfront investment in resources and time. There is a lot to consider before and during migration.

Advantages to move on to cloud

Scalability

Advertisment

On cloud, scaling up your system can sometimes be as quick as dropping down a selector box and picking the new size. This ability to scale without the need of waiting for a new server as well as a decent amount of employee time makes cloud a very viable option.

Elasticity

Many modern cloud data warehouses can be set up to be elastic. This means, data warehouse dynamically allocates computing resources depending on what is required. By only using the number of computing resources required teams can reduce their cost while still benefiting from high-performing data warehouses.

Advertisment

This wasn’t the case with on-premise servers where you were often stuck with the limited memory resources your company had. Also, companies would often have to buy more computing resources than required leading to a lot of wasted compute.

Reduced Database Management

Having on-premise servers means you need to have server administrators and database administrators. When servers go down or need to be upgraded, this can cause a lot of downtimes. Also, it requires server admins, and database administrators to spend time focused on upgrading systems vs. doing other work.

Advertisment

Much of these expenses are put on the cloud providers.

Disaster Recovery

Servers are not impenetrable and there are many ways they can fail. Whether it be due to a natural disaster like hurricanes or general wear and tear. Servers are constantly failing and needing to be backed up.

Advertisment

The cloud makes this easy. Our team doesn’t need to do anything other than deciding how we will keep backups of our data warehouse. We can take snapshots of the data warehouses, make duplicates, or store backups in the various systems provided by the cloud providers.

Migration approach

A common misconception about cloud migration is that it will be a one-time trip. But the reality is that the process of migrating data infrastructure to the cloud should happen gradually. A successful migration should feel as seamless as possible to the organization, so work isn't disrupted.

Advertisment

Divide and conquer

Possibilities of failure are high in large-sized migration, as migrating and modernizing too much in haste brings the complexity of deliverables. A migration strategy with multiple or segmented phases is less risky and accelerates achieving modernization objectives for organizations.

Pilot Phase

The first deliverable should be easy but useful. For example, successful data migration projects should focus the first phase on a data subset or use case that is both easy to construct and in high demand by the business. The complicated stages of the data migration must be kept for later phases.

Contingency plans

For preventing the possibilities of failure, organizations must segment their work phases and be ready for recovering from failures. Don't be too eager to unplug the old platforms because you may need them for rollback. It is inevitable that old and new data warehouse platforms (both on-premises and on clouds) will run in parallel for few months.

Lift and shift not as easy as it looks

Organizations must consider that lift and shift of the data and old applications can bring inadequacies, due to differences between the old and new platforms in terms of performance, functionality, and interfaces. Developments are essential to platform-specific components, which include hand-coded routines, user-defined functions, macros and stored procedures.

Team with right skills

Before migrating to a cloud-based data warehouse platform, consider hiring consultants or new employees who have cloud experience, not just data management experience. It’s important to train existing employees on the new cloud platform.

Change management

Data migrations affect many types of people. The plan should accommodate them all. A mature data warehouse will serve a long list of end-users who consume reports, dashboards, metrics, analyses, data extracts, etc. Affected parties should be involved in planning a data warehouse modernization and migration to the cloud. First, their input should affect the whole project from the beginning, so they get what they need to be successful with the new cloud data warehouse. Second, the new platform roll-out should take into consideration the productivity and process need of all affected parties.

Migration Challenges

Migrating data from an on-premise warehouse to a cloud-based environment creates several challenges. You may need to re-plan your data model, use a separate platform for scheduling tasks, connect custom data applications, etc.

Data Model changes

Cloud data warehouses support different schema and data types. There is also a wide set of data types that are not supported at all (BLOBs, geographical coordinates, etc.). Also, it encourages a schema denormalization approach for increased performance. The increased storage required to hold de-normalized data is fairly cheap. However, running JOINS on tables stored on distributed servers is very expensive, and does not lead to the desired performance improvements. Therefore, you must keep the cloud and the on-premise data models synchronized during migration.

Connecting custom data applications to your data warehouse

Another obstacle is adjusting the interfaces your custom data applications use to connect to your data warehouse. In other words, though the ODBC/JDBC drivers are supported and actively maintained, they hardly behave identically to one another. Changing an application’s database driver might require various query adjustments. Some changes will be obvious upon first use, as the SQL statements may result in visible errors. Other changes are less obvious, as different ODBC drivers may make slight data conversions. Examples of such conversions may be the precision of floating-point numbers, the timezone format of timestamps, and the way NULL values are handled. These changes will only appear as data discrepancies, and will take more rigorous testing to detect.

Stored procedures

A very common on-premises data warehouse feature that is often overlooked and is missing from their cloud competitors is the ability to write and use stored procedures. The leading cloud data warehouses, Snowflake, Redshift, and BigQuery, all support user-defined functions (defined in Python, SQL, or JavaScript), but for many, this is not enough. The layer of stored procedures on a data warehouse is like a repository of miniature data applications, which save a lot of work and help in preserving organization-specific knowledge. The common alternative is to use a separate platform for scheduling parametrized queries or task orchestration.

Legacy ETL processes

Once all legacy data has been migrated to your cloud, the key step is to repoint your ETL processes to your cloud data warehouse. In some cases, it will be just a change of configuration, and in others, it may require a complete re-write which could be time-consuming and costly.

To Conclude

The migration of data warehouses to the cloud needs patience and forethought to realize significant outcomes. A typical data warehouse contains a large amount of data covering many business areas. Migrating all the data at once would almost guarantee failure. You need to take incremental steps to successfully migrate your data warehouse to the cloud, especially when undertaking significant design changes.

An incremental approach enables you to keep operating your on-premise data warehouse, while your cloud data warehouse goes live.

By Nitin Srivastava, Director of Data and Analytics India Teams, Advance Auto Parts India