InfoWorld

Steve Pao

Data management: beyond just storing bits

Data is the new oil. As enterprises increasingly recognize the value of digital assets, infrastructure considerations have evolved well beyond simply storing data into the growing field of data management.

Traditional views on managing data typically involve block storage for database and application servers. But videos, images, sensor data, and other unstructured file data that can’t be easily stored in traditional, relational databases are bigger and much harder to manage.

Real opportunity lies in data management, and unstructured data requires a different approach. With unstructured data predicted to grow at a compound annual growth rate of 29.8 percent through 2021, according to IDC Research, data backup and archive must evolve.

Here are four higher-level functions of data management.

1. Data protection

Anyone who has lost their smartphone without backing up their data has felt the pain of data loss. For consumers, data protection is an issue of securing their privacy and digital assets of personal value, such as photos with family and friends.

For enterprises, data loss is not only inconvenient and painful, but can be devastating for business. Understandably, enterprises want to protect their valuable intellectual property and other critical data. After all, it’s what has helped them achieve their success.

With data protection in mind, organizations seek secondary storage that backups and archives quickly – meaning no risk of missing backup windows or any need for backup windows at all – and restores data on-demand.

2. Data movement

Growing data sizes and more sophisticated computing paradigms have created a need for data movement.

For example, my iPhone integrates with a variety of cloud services, including iCloud and Dropbox. Data movement policy can determine what photos get replicated to iCloud and what is retained locally on the phone. Policy conditions, such as available capacity on the phone itself, can determine whether high resolution or lower resolution versions of photos are stored locally and what is stored in the cloud. Performing these functions depends on a central catalog of all the data and its awareness across tiers of storage.

Similarly, enterprises in every industry – from media and entertainment to bio-IT – have large amounts of data, performance issues, and a desire to constrain local capacity growth but at a scale much larger than for individual consumers.

3. Discovery

Data has become hard to find. That’s because it’s difficult to know what’s scattered across hundreds of file systems and hybrid cloud architectures.

As the vast majority of data is machine-generated, manual cataloging simply isn’t possible. Today, some applications to manage digital assets generate their own catalogs, but federating those silos requires an appropriate ecosystem.

In the consumer world, separate catalogs exist on the iPhone for Photos, Music, App Store, and other applications. However, searching through those catalogs is possible using Spotlight Search for iOS’s federated search because the iPhone provides a search indexing ecosystem for all applications to utilize.

As in the consumer world, separate applications for unstructured file data may exist in the enterprise in the form of document management systems, digital asset managers, laboratory information management systems, and other applications. Other times, applications may not exist at all to manage file data for workflows. No easy system exists to let users know what data is available or where it lives.

Opportunities exist to provide a common layer in which to store, catalog, and manage metadata for search and discovery of digital assets in the enterprise.

4. Learning

Every data management application needs a set of analytics to help users know what’s there, and application scenarios often involve analyzing data to create more useful metadata for classification.

Back in the consumer world, Photos for iOS catalogs photos based on EXIF data generated by the camera app, such as the time and GPS coordinates where the photo were taken. Beyond simple cataloging, Photos for iOS can also classify the photos to identify faces and places through machine learning. When you search your photo files for “London,” Photos parses GPS coordinates from the image files using this classification and pulls up all photos taken in London.

In the enterprise world, these same opportunities exist to “decorate” metadata in a common way across applications. Opportunity exists to move from a closed world, where decorated data often lives in proprietary applications or SQL databases off to the side, to an API-driven world, where metadata is stored in common object formats and index stores are accessible via common APIs. Making data more easily accessible to software paves the way for valuable applications, such as using machine learning for auto-classification of data.

While traditional views on enterprise data focus on storage and data protection, the real opportunity in today’s digital economy lies in the higher-level functions of data management.

In “3 reasons to embrace horizontal scaling for secondary storage,” I referenced a comment made by Taneja Group Founder Arun Taneja. “Data protection as a discipline has been sleeping for decades,” Arun wrote, “but data management may be the wakeup call it needs.”

When you have a common data management layer, you gain so much power. From an enterprise perspective, wouldn’t it be beneficial for different applications to rely on the same underlying data management functions?

New innovations in data management make it possible. Tying all these data management functions together and eliminating silos benefits your enterprise, your IT team, and your business users.

 

This article was written by Steve Pao from InfoWorld and was legally licensed through the NewsCred publisher network. Please direct all licensing questions to legal@newscred.com.