Skip to main content
All articles
data-architecture
4 min read

Understanding Principles of Data Mesh Architecture — part 1

Challenges with Traditional Data Lakes

data-architecturedata-strategy
Understanding Principles of Data Mesh Architecture — part 1

Overview

Challenges with Traditional Data Lakes

Traditional data lakes serve as central repositories where diverse data from various sources is stored in its raw, unstructured format. Managed typically by a single team, such as the Data Engineering or IT Infrastructure team, these lakes allow for the storage of vast amounts of data, including structured, semi-structured, and unstructured data.

However, in large organisations with numerous domains, this centralised management can lead to inefficiencies. The IT team often struggles to handle domain-specific requests effectively due to a lack of understanding of the unique needs and data requirements of each domain, resulting in delays, reduced data quality, and a lack of responsiveness to business unit needs.

How Data Mesh Solves Problems of Data Lake

A data mesh offers a decentralised approach to data management, aligning data ownership with domain-specific teams within an organisation. This method treats data as a product, managed by the teams that generate and use it.

By decentralising data ownership, a data mesh allows for better scalability and governance, as each domain team is responsible for the quality and relevance of their data. This approach also fosters a culture of accountability and collaboration, ensuring that data is managed and utilised effectively across the organisation.

Guiding Principles of Data Mesh

Zhamak Dehghani outlines four guiding principles of data mesh.

  1. Domain-Driven Data Ownership (Soul)
  2. Data as a Product (Heart)
  3. Self-Serve Data Platform (Body)
  4. Federated Data Governance (Mind)

Data Mesh Guiding Principles — by Monte Carlo

Domain-Driven Data Ownership (Soul)

The foundational principle of data ownership is that individual business domain owners should own their data.

This principle emphasises that the team responsible for a specific business domain (such as marketing, finance, etc.) should also be responsible for the data generated within that domain. This approach aligns accountability with expertise and ensures that those who understand the context and nuances of the data are in charge of its management. It promotes better quality and more relevant data handling since domain experts are directly involved.

Just as the soul is the essence of a human being, domain-driven data ownership is the essence of Data Mesh. It ensures that the data’s integrity and relevance are maintained by those who understand it best.

Data as a Product (Heart)

A data product is a well-defined, self-contained unit of data that solves a business problem.

Treating data as a product means viewing it not just as raw material but as something designed to meet specific needs and deliver value. A data product can be simple (like a table or report) or complex (an entire machine learning model). Data products emphasise meeting user needs through proper design, usability, discoverability, and prioritisation based on how they are packaged and delivered to users.

The heart pumps life into the body, just as data products drive value and insights throughout the organisation. By treating data as a product, we ensure it is designed and delivered to meet specific needs, much like how the heart ensures blood reaches every part of the body.

Self-Serve Data Platform (Body)

Teams have their own platform to build data products rather than relying on central teams to build them.

This principle advocates for decentralising the creation and management of data products by providing teams with their own tools and platforms. The self-serve platform consists of all necessary infrastructure components such as storage, processing power, tools for cleaning, testing, deploying models/data sets so teams can independently manage their workflows without bottlenecks from centralised IT departments.

Just as the body enables movement and action, a self-serve data platform empowers teams to create and manage their data products independently. It provides the necessary infrastructure and tools, ensuring agility and efficiency in data handling.

Federated Data Governance (Mind)

Governance is decentralised with policies embedded in the mesh to ensure compliance across the organisation.

Federated governance involves distributing governance responsibilities across various domains while maintaining overall coherence through embedded policies within the mesh framework itself. This approach balances autonomy with compliance by ensuring each team adheres to overarching standards while allowing flexibility in managing their specific datasets according to local requirements.

The mind governs and coordinates the body’s actions, ensuring harmony and well bieng. Similarly, federated data governance ensures that while each domain has autonomy, there is a coherent and compliant approach to data management across the organisation.

The guidelines are explained in detailed in the next article.

https://aradsouza.medium.com/understanding-principles-of-data-mesh-architecture-part-2-7817c86fcc40


This article was originally published at https://medium.com/@aradsouza/understanding-principles-of-data-mesh-architecture-part-1-1738a20e6b2c