THE CLOUD CONSPIRACY: The commoditization of storage and compute resources, plus the advent of virtualization, was the perfect storm for the cloud.
The cloud materialized like a conspiracy. Of course, there wasn’t a sinister master plan or any secretive plotters. But an array of factors combined to create just the right atmosphere for the cloud to take shape.
First, storage became inexpensive and commoditized. As profit margins for hard disk drives diminished, storage vendors found it more and more difficult to invest in innovation.
Second, computing became similarly commoditized. It became easy to sell large quantities of powerful processors. Semiconductor vendors continued their pursuit of proving Moore’s law, and processor buyers often purchased more computing power than they needed. While some technical research customers benefited from the increased number of operations per second, most CPUs were used for email, database workloads, shared document management, and other relatively mundane applications.
Virtualization—the third primary factor in the development of the cloud—appeared as an answer for what to do with underutilized compute horsepower. The ability to disaggregate software and hardware, and thus run multiple “virtual machines” on a single server, allowed organizations to squeeze every drop of computational goodness out of each system.
The combination of inexpensive storage, ubiquitous compute power, and virtualization capabilities was the perfect storm for the development of the cloud. Organizations transformed their in-house data centres into on-site private cloud environments.
Meanwhile, as new public cloud providers emerged, other existing off-site service providers recast themselves as cloud providers that could help clients trade large capital expenditures for modest operating expenses.
MATCHING STORAGE CHARACTERISTICS TO WORKFLOW STAGES
Each type of storage offers a different balance of performance, cost, protection, and capacity.
In the media industry, there isn’t one type of storage that can address all requirements. Each storage type has its own advantages and optimal use cases. Even within a post-production workflow, one might need multiple types of storage to accommodate a variety of requirements. For example, one might want flash-based solid-state drives (SSDs) for animation and visual effects, high-performance hard disk drives (HDDs) for editing, object storage for keeping infrequently used media readily accessible, and data tape for longer-term archiving.
When deciding which type of storage to use and when to use it, it might be needed to find the right balance among performance, cost, protection, and capacity.
PERFORMANCE: OPTIMIZING FOR WORK IN PROGRESS
High performance is critical for ingest and a wide variety of post-production tasks, from editing and colour correction to animation and visual effects. One needs to provide rapid access to large high-resolution media files, facilitate collaboration among numerous team members, and deliver a responsive experience without any productivity-sapping lags.
Unfortunately, fast drives are often expensive and do not typically offer large capacity. The higher the performance, the more one’ll spend and the less capacity one’ll have for one’s money. For example, flash-based SSDs can deliver exceptional performance, but at the expense of cost and capacity.
Typically, one would choose to protect high-performance systems with RAID storage, which is designed to overcome disk hardware failures. The RAID scheme one chooses will come with its own trade-offs in terms of protection, performance, and capacity.
There are use cases in which optimizing for the highest possible performance is completely appropriate. For example, compositing systems that require multiple streams of high-resolution content for short-format projects can benefit from the added performance of an all-flash system. These workflows don’t require high-capacity disks. In addition, editorial departments with large numbers of editors working on the same content need a lot of performance to support the random I/O generated by multiple seeks.
COST: REDUCING EXPENSES AND ENSURING LONG-TERM PRESERVATION
At the opposite end of the spectrum from primary storage, one’ll likely use a different type of storage for archiving and long-term retention. In the media industry, an “archive” often conjures up images of videotapes piled haphazardly in a closet. For this reason, producers have often found it less expensive to reshoot something than to search for it in their tape library.
Today, however, there are much better options for archiving content and preserving completed projects. In a managed multi-tier storage environment, one can implement a cost-effective archive that still offers clear visibility and fast access to content.
Why is archiving important? First, the right archival solution can help one preserve valuable content for years. Second, it enables one to remonetize that content—reusing media that has been captured and updating completed projects, saves time and money.
There’s also a hidden benefit to archiving content—every terabyte of content that one archives to a lower-cost tier is another terabyte that one has available in a more expensive, higher-performance primary storage environment. As one’s content production workflow grows, they can use the valuable primary storage space they already own without having to buy more.
For this archive storage tier, adopting managed archive libraries that use data tape with automated tape handling capabilities can save thousands of dollars in data storage and data management compared with other storage methods. Even as HDD prices continue to fall and cloud services offer seemingly inexpensive storage plans, LTO tape remains the most cost-effective method for preserving and accessing retained content.
PROTECTION: OPTIMIZING FOR COST, RELIABILITY, AND DURABILITY WHILE MAINTAINING FAST ACCESS
There is an important middle ground between the expensive, high-performance, fast-access storage of primary systems and archival systems that use data tape.
When the time is right, one will need to access these files immediately. And until then, they need to make sure they remain online and available to whoever needs them.
Object storage can occupy that unique middle ground between primary storage and archive tier. This type of storage uses lower-cost storage than the primary system. While it doesn’t deliver the performance of primary systems, it provides greater flexibility. Object storage enables one to support a variety of drive types, eliminates the time required for RAID rebuilds, can help transition smoothly across generations of storage, and allows avoiding platform and file system incompatibilities.
It’s not surprising, then, that object storage is the basis for almost all public cloud services. When an object storage system in implemented in a facility, a private cloud is essentially created.
While primary storage systems usually rely on some kind of RAID or clustered disk array configuration, object storage uses a distributed data recovery encoding or replication scheme to distribute content over many separate devices and disks. Because pieces of content are copied or coded to numerous disks, object storage systems become more reliable as disks are added. RAID systems become less reliable.
Object storage systems gradually recover missing data and become healthier over time. By contrast, RAID systems have a period of vulnerability until a failed disk is completely rebuilt. During that period, another disk failure could bring down an array.
Object storage also streamlines upgrades and service. All the component levels—disks, nodes, and racks—are managed as independent items. A larger-capacity disk or newer-generation storage device can be installed easily. Failed devices do not need to be replaced at all if desired; the only cost to the system is capacity. These capabilities help enhance durability. Object storage can span generations of hardware types and operating systems.
Object storage systems are ideal for repositories where content is stored and accessed for delivery, but the content itself changes very rarely. New versions may be added, but the content of the objects remains unchanged.
There is, however, a challenge with object storage for media production workflows. Almost all applications in the media ecosystem rely on files as their means of accessing and modifying content. Object storage relies on “objects,” which can’t be read by a file system. Files must be turned into objects on submission and back again into files on retrieval, which requires a system of file-to-object management in the workflow and support for the linking of files and objects. Most systems, such as media asset management (MAM) or file management systems, maintain a database that keeps track of the relationship between specific files and their associated objects.
CAPITALIZING ON THE CLOUD AT DIFFERENT WORKFLOW STAGES
Cloud storage can offer benefits at multiple stages of the post-production workflow.
When and how should one use the cloud for media storage?
There are opportunities to capitalize on the cloud at several workflow stages.
Ingest: Ingesting new content is an excellent time to leverage the protection attributes of the cloud. As new content comes in, it can be copied to a public cloud service for protection while work is done on-site with the original files. If something happens in the production environment that makes a file unusable, the copy can be retrieved from the cloud and work can continue.
Still, there are limitations to using the cloud for ingest. One can certainly recover a file or two on short notice but recovering all project content from the cloud—because of a catastrophic event that affects the production environment—would be very costly since most cloud services charge escalating retrieval fees for data in deep archive.
Editing and finishing: The cloud is not a good fit for work-in-progress editing and finishing. During these phases of the post-production workflow, a team needs high-performance storage connected over high-speed, deterministic networks. Craft editing, colour correction, compositing, and other tasks that rely on multiple streams of high-quality content can’t be supported over current Internet connections reliably or affordably. Some tasks, such as logging or captioning, can be supported through proxy-quality content, but they will still have to be conformed with higher-resolution content before finalizing the project.
Extended online editing, transcoding, and delivery: Some stages of the workflow are well-supported by lower-cost, lower-performance storage—especially jobs in which content creators rely on small files, such as captioning, applying sound effects and voice-overs, and transcoding versions prior to content delivery. Object storage can act as a good solution for this type of work when the translation between objects and files can be navigated transparently.
On-site object storage creates a private cloud of immutable content that automatically prevents inadvertent changes. As a result, object storage is a good solution for content delivery repositories because once an object is written it can only be versioned and not directly modified. Combining an on-site private cloud with public cloud services creates a hybrid cloud where content can be moved smoothly to the service that best suits the current requirements.
Archive and vault: Archiving (long-term preservation of infrequently used content) and vaulting (very long-term preservation of rarely used content) are similar processes. The only differences are the anticipated length of time content is stored and the expected need to retrieve that content. For both archiving and vaulting, the cloud can be a very efficient and effective solution— particularly for content that can’t be thrown away, but may never be needed again.
It’s important to understand retrieval requirements and fees when evaluating public storage options for archiving. Retrieval charges are separate from download fees and can be significant depending on how fast one needs content. Given that standard retrieval provides access to content for download within hours, and expedited retrieval enables access within minutes, expedited retrieval might be worth the cost when needed. The true cost must be considered when comparing solutions.
The data protection and disaster recovery benefits of public cloud archives and vaulting might also be worth the recurring storage costs and retrieval fees. With a public cloud, wouldn’t be affected by any interruption primary facility. In addition, most services create multiple copies of data in a single site or across multiple sites to enhance protection.
Another benefit of outsourcing archive storage—the costs of keeping equipment current can be avoided. Cloud storage providers store data on the most up-to-date—and most reliable—media. If this content is stored by our own, we would likely have to pay for new equipment every five years or so to stay current.
By Jim Simon, Vice President of Global Field and Channel Marketing, Quantum Corp.