Current data architecture is going through a revolution. Enterprises are starting to shift away from the monolithic data lake towards something less centralized: data mesh. It's a relatively new concept, first coined in 2019, that addresses potential issues with data warehouses and data lakes that can cause businesses to be slow, unresponsive, or even suffer from data silos. What is a data mesh, and how could it benefit your business?
Sign up for our free 14 day hosted trial to learn how.
Data mesh goes beyond using a single destination for data storage and analysis to a distributed system that links all the data sources a company might need to access. The core concept is that the way we think about data distribution is reversed: Business domains no longer push information through data pipelines into a single destination like a central data lake. Instead, they host and serve datasets in a universally consumable way. Data is available and fast, to any data product that has access, whether that’s a business’s CRM or a suite of business intelligence tools.
Domain-driven decentralization, or DDD, is a key term associated with data mesh. DDD means that data ownership is important. Whoever owns the data is the best party to handle it. They continue owning that data, providing access to it when required.
Zhamak Dehghani, previously of Thoughtworks, is widely credited with the founding and promotion of the concept of data mesh. She has written the book on the subject, titled Data Mesh: Delivering Data-Driven Value at Scale, published early in 2022. This timeframe highlights how relatively new the concept is, but also how much the hunger for better data management techniques exists and how quickly it’s been adopted by data scientists.
Zhamak recently spoke at Big Data LDN: Mission to the Dataverse, on a panel titled “Data Mesh – What You Need To Know.” A highlight of the panel was Zhamak pointing out that what it did, at its inception, was challenge assumptions surrounding data architecture and point out pain points that had always existed without anyone ever questioning why.
Data mesh is a movement towards decentralized architecture, which may be anathema to data engineering specialists who have always worked with centralized data pools such as the data warehouse or data lake. Data mesh promotes a shift away from monolithic data structures with the potential to increase data accessibility and organization.
APIs are a critical component of the mesh ecosystem because data mesh is, essentially, a network of data products. “Product thinking” refers to a mindset here, which views every source of data, such as third-party apps or SaaS, as a separate data product within the mesh. Connecting to those data products is often easiest by generating APIs–applications programming interfaces that provide communication between disparate entities.
The basic construction of a data mesh starts with APIs that connect to enterprise data sources that a business needs to access. APIs might even be used to connect to specific datasets within the data product. Every data product carries a catalog of its own data. When changes are made or data is accessed or transferred, every event log is captured, aiding in data governance and security. Change Data Capture (CDC) technology is often used for logging those changes and distributing data accordingly.
Data products within a mesh are able to subscribe to each other’s data so that any relevant changes that affect another product within the mesh are communicated virtually in real-time. Event Streaming Backbones distribute API calls and events and then manage the topics to which different products subscribe. Data products can publish events linked to these topics, and the Backbone uses subscription information to transfer that information to the subscribed products. This allows a constant back-and-forth pattern for the data, increasing the speed at which businesses gain insights and decision-critical information.
Towards Data Science gives the example of a data analytics tool within the mesh. The analytics program might subscribe to two separate data products to ensure it has the most current information to train a machine learning (ML) model. Any modifications that occur in those data products prompt a notification to go to the analytics tool, describing exactly what has changed. This prompts the analytics data product to update its own data repository, which changes the information it’s using to train the ML model. These real-time updates in data lead to better, faster learning for the ML and better outcomes for the business.
Of course, there’s no point in switching to a whole new data architecture unless there are measurable benefits. We’ve touched on how data mesh can help businesses gain insights faster through an increased interoperability of its systems and even break down data silos and prevent bottlenecks. But what other benefits are there to this paradigm shift in how we manage data?
Because every change or modification to any aspect of data is logged in a mesh, it increases the observability of systems. Observability is the ability to understand the health of a system from its outputs. In other words, you can tell what’s happening inside a system from the outside through easily accessible data. Data mesh architecture, by its very nature, provides the maximum amount of data about the systems it’s connected to, allowing data teams to achieve faster troubleshooting and reporting on any issues.
Another advantage of this type of data infrastructure is that it can provide higher levels of security. API security such as SSL/TLS and authentication requirement keep connections between data products safe, while the constant logging of all access and changes provides a digital paper trail that’s instantly accessible whenever a security concern arises. Data mesh architecture makes life easier for cybersecurity professionals, which in turn frees them up to address other business concerns such as phishing or even ransomware threats. This security timesaver can save businesses thousands of dollars and promote more efficient running of organizations.
Every company has its own policies around data governance–how data is stored, transferred, accessed, and even deleted. Data governance has become more complicated in recent years with the increased use of third-party SaaS and PaaS. Data mesh helps simplify data governance somewhat through “mesh-wide” principles that can be applied to all data products within the mesh. Companies can decide to set requirements and policies that are then managed by a data platform that connects to all the domains a business works across. This ensures everyone working for the company adheres to the rules, but also that data from any third-party product is ingested in line with company policy.
IBM notes several business use cases and muses that an increased adoption of data mesh architecture can lead to improved customer satisfaction through personalized experiences. Data mesh can lead to better automation of products, such as virtual assistants, and increased opportunities for self-service, as well as bespoke business intelligence dashboards that contain all the data relevant to an enterprise and none that isn’t. There's also the potential for improving AI and ML projects via faster, slicker data capture.
Blockchain is an example of mesh architecture success in a basic form. Every chain on the block comes from a different source, and data is highly secure, highly encrypted, and always disparate. Modern data mesh architecture builds on this, eliminating the slow performance associated with blockchain through a data-sharing model created with day-to-day business in mind, as opposed to single transactions that can take 15 minutes or more, as they often do on blockchain. It also offers scalability beyond what’s achievable currently, as there’s potentially no limit to the number of data products that can exist on a mesh.
Of course, mesh architecture is in its early stages and might not be appropriate for every business at this stage. Smaller enterprises may find few benefits to adopting a data mesh architecture, particularly if much of their IT and DevOps is in-house and links to few external sources. Businesses still relying on monolithic legacy systems may find the cost too prohibitive to move away from these entirely. Other businesses may simply not have the change-management structure in place to engage key personnel to make the required digital transformation happen. For many enterprises, though, exploring the business benefits of data mesh can mean saying goodbye to time-consuming dataset organization and obstructive data silos, and welcoming a new era of real-time, accessible data flow throughout the entire organization.
Sign up for our free 14 day hosted trial to learn how.
DreamFactory is a low-code solution for creating and managing APIs, the foundational components of any mesh framework. Find out more and start your free 14-day trial.