What is a data fabric?
The concept of a “data fabric” is emerging as an approach to help organizations better deal with fast growing data, ever changing application requirements and distributed processing needs.
The term references technology that creates a converged platform that supports the storage, processing, analysis and management of disparate data. Data that is currently maintained in files, database tables, data streams, objects, images, sensor data and even container-based applications can all be accessed using a number of different standard interfaces.
A data fabric makes it possible for applications and tools designed to access data using many interfaces such as NFS (Network File System), POSIX (portable operating system interface), a REST API (representative state transfer), HDFS (Hadoop distributed file system), ODBC (open database connectivity), and Apache KAFKA for real-time streaming data. A data fabric must also be capable of being enhanced to support other standards as they emerge in importance.
There are several goals a data fabric must address including the following:
- Combine data from established systems – regardless of size and future requirement for scalability, and make that data available to applications
- Provide speed, scale and reliability –access to data maintained within the data fabric must meet business requirements for speed, scale and reliability across multiple computing environments without requiring trade-offs.
- Support multiple locations – Allow access to data from systems residing at the edge of the network, in the enterprise data center, and even in cloud computing environments (Amazon Web Services, Microsoft Azure, and Google Cloud Platform)
- Create a unified data environment – the data fabric must create a global namespace making files easy to find and access, provide strong levels of security, provide compression to reduce overall storage requirements, provide an ability to take a snapshot of the data for backups and to support application development, as well as supporting multi-tenant (multiple company) computing environments
- Provide high reliability and availability – the data fabric must provide a highly reliable environment that manages itself and heals itself when there is a problem and provides high availability to service mission critical needs.
Why should you care?
The reason the concept of a data fabric has become important to major enterprises is that businesses are facing significant challenges today. Their IT systems are becoming more complex than ever before. They need the ability to work across complex disparate environments while supporting existing applications and also new microservice-based applications.
In the past, each application development team chose their own approach to storing and retrieving data. If we examined what is executing in the typical enterprise data center, we would find data stored in flat files, in relational (SQL) databases, in non-relational (NoSQL) databases, and even in big data repositories having their own content storage approaches. The resulting distribution of data into separate silos is one of the major challenges facing organizations today.
Why is unification such a problem?
Unifying all of this data can be quite a problem. Applications store the same data in different formats. Data is stored in many places, in different application silos, and this means the unification process would require “de-duplicating” the duplicated data. Getting data to the right application at the right time and in the right way isn’t an easy problem to address.
Another challenge is that work is increasingly done at the edge of the network rather than in the enterprise data center. Customers and staff now have applications that access data from their smartphones, machines, and a range of new sources driven by the Internet of Things (IoT). This means that organizations need to efficiently process data generated at the edge and to share and learn from this data and push the intelligence back out to these edge devices.
Enterprises are learning that making a minor change, such as updating an application or development tool to comply with new regulations, changing business requirements, or adopt new technology can create incompatibilities that are felt throughout the company. Incompatibilities quickly translate into problems and problems equal time and money.
Moving data into a data fabric can address the needs for an agile, global data environment that can optimize for costs, performance, meet new government regulations and minimize future problems as new technology, such as servers based upon new microprocessor architectures such as ARM or NVIDIA, become more prevalent.
How do you learn more?
If your company is currently feeling the effects of these challenges or you’d just like to get a leg up on your competition, a number of suppliers are offering tools that address some or all of these requirements. MapR Technologies is already offering a converged data platform today that addresses all of the requirements I mentioned earlier in this article. NetApp currently is using the term “data fabric,” but largely is focusing on a lower level of solutions – backup/DR, syncing data with cloud storage, supporting a fast connector to Hadoop clusters, and to MongoDB data stores. Talend also is using the term “data fabric,” but in this supplier’s case, the focus is generating optimized native code (Java/Spark/SQL) designed to access cloud-based storage.
The vision of a data fabric offers many opportunities to help enterprises address the business requirements to unify data and both simplify and accelerate today’s complex computing solutions. I would recommend suppliers demonstrating the broadest view of what this vision can mean and have already have a track record of success implementing it.