Managing Archival Data

author-image
DQI Bureau
New Update

In today's business environment information and data have transformed to
become the most important corporate assets. This has given a huge impetus to the
storage solutions market. Vendors in the storage space have come up with many
innovative products and solutions that help organizations store and manage
corporate data. However, with an exponential growth of data, it is becoming
increasingly difficult to store and manage archival data. It has become more
problematic because both structured and unstructured data has become an integral
part of businesses.

Advertisment

The traditional definition of structured data is that which is organized by
the well-defined structure provided by databases. Database sizes are growing so
fast that it is impeding application performance, stretching backup windows and
artificially inflating the total cost of operations.

However, if we look at unstructured data, the growth of unstructured data has
far surpassed the growth of structured data. This is due to the inherent nature
of unstructured data. Unstructured data typically comprises of documents,
spreadsheets, graphics, still and motion images and various other formats. Going
further messages and e-mail can be classified as semi-structured data as they
can be used form making a framework for further classifying unstructured data.
According to industry estimates over 50 percent of the data residing in data
centers falls into these categories.

One particular yet simple example that is troubling almost everyone,
including CEOs and CFOs is the phenomenal growth of e-mail. Users normally get
messages such as "Mailbox size limit exceeded". Adding to the problem
are new regulations which are forcing corporations to retain their e-mails for
specified period and to be able to produce them on demand. This is a difficult
task, given that they have routinely been spread throughout an IT infrastructure
and subject to regular purging to limit the size of e-mail stores.

Advertisment

In the face of such a scenario, organizations have to resort to techniques
such as Data Life Cycle Management. This is done by effectively by managing all
the data that is considered to be a corporate asset, by matching availability
and retrieval time with the data's value which varies throughout the data
lifecycle. In adopting Data Life Cycle Management techniques organizations can
elevate the efficiency and responsiveness of the total storage environment and
utilize available capacity optimally.

While it is fundamental that IT departments continue to ensure capacity
requirements are met for critical applications, there is a further demand for
more effectively managing digital assets by moving them to a different class of
media based on their current value. The idea is to take advantage of waning
requirements for retrieval time and availability by moving less valuable,
less-likely to be accessed data to less expensive storage. Doing so necessitates
greater intelligence for managing storage devices and automatically moving data
within the overall storage environment from the time it is created until its
expiry.

Further, since more and more information that is generated out of business
activities is outside the boundaries of structured bounds and retrieval
mechanisms. This all the more arises the need to quickly catalogue, search and
retrieve this unstructured information into the storage environment itself. At
the same time solutions must encompass varying classes of storage devices and
media arranged in tiers in order to balance the cost of storing any particular
data asset with its current value from the time of creation to end-of-life.

Advertisment

Therefore, the archival platform solution should be an ideal combination of
intelligent storage and an open and collaborative approach to storage software.
This combination can be most effectively used with ISO's Reference Model for
Open Archival Information Systems (OAIS). OAIS is a proven foundation for
archive systems, having served as the underpinnings of some of the largest data
archives in existence.

Following the OAIS foundation guidelines, the storage environment should be
able to deliver the various functions in the OAIS model which are:

Preservation Planning: This involves understanding the
business-specific issues related to data and how that data value varies over its
useful lifetime. An appropriate mix of consulting and technology is required to
draw down the archival policies which form the basic framework of managing and
retrieving archived data based on their value. This is the first step in
implementing the OAIS model.

Advertisment

Produce: This function involves the aspect of handling all the data
assets produced by any manner of industry or activity.

Ingest: With the data being produced, the ingest function prepares the
generated data to be prepared for storage and management within the archive
store. The actions in an ingest functions include; creating a digital signature
for uniquely identifying the object, indexing it, and moving the metadata
describing it onto the metadata store. Metadata is information about the data
that is used in populating, maintaining, and accessing both the descriptive
information that identifies the archive's holdings and the administrative data
used to manage the archive.

Data Management: Once the metadata is developed, data management
involves indexing the metadata so that it is made searchable and can be
retrieved when required. A link from the metadata store is used to determine
where the data asset in maintained in the storage archive infrastructure.

Advertisment

Archival Storage: This function stores, maintains and retrieves data,
manages the storage hierarchy including movement based on changes in data value,
and provides disaster recovery capabilities. This function is further enhanced
if the archival storage solution allows seamless data movement in a
heterogeneous storage environment. Interoperability is a key aspect in the
archival storage function.

Administration: This includes configuration management of system
hardware and software, system engineering functions to monitor and improve
archive operations, updating archival and HSM policies and customer support.
Routine administration functions are handled using management tools. Services
support optimizes the overall operations of the archive system.

Access Control: This helps consumers find information, limits access
as required (for example, enforcing read-only access for mandated retention
periods), and delivers query responses to consumers. This should also be
combined with tamper proof functionality. This can be achieved by locking disk
volumes as "read only".

Advertisment

Consume: Just as in "produce", consumption has to be
tailored to the intended use of data assets. Often this involves integrating the
archival system to the application ordinarily used to access the data. While it
is important to provide a general interface for archived data retrieval for
auditors and administrators, real value is added by enabling the application and
application user to continue working as always. Their standard application
interface and access approach should not change whether data is in primary,
secondary or tertiary storage.

The archival storage architecture should be based on an open, ISO-compliant
architecture that implements Data Lifecycle Management as a complement to
mainstream storage and business continuity practices. This allows enterprises to
participate in an interoperable environment where the right data is always
available at the right time and there is no need for the special purpose storage
management software and devices used by other solutions.

Sudhakar Rao, technical
director, Hitachi Data Systems on Data Lifecycle Management

Advertisment

How to Manage?

Preservation Planning: Involves understanding business-specific issues
related to data

Produce: Involves the aspect of handling all data assets produced by any
manner of industry or activity

Ingest: With the data being produced, this function helps the generated
data to be prepared for storage and management within the archive store

Data Management: Involves indexing the metadata so that it is made
searchable and can be retrieved whenever required

Archival Storage: Maintains and retrieves data, manages the storage
hierarchy, including movement based on changes in data value, and provides
disaster recovery capabilities

Administration: Include configuration management of system hardware and
software, system engineering functions to monitor and improve archive operations
and updating of archival policies