Skip to main content
All articles
data-engineering
7 min read

dbt Model Contracts

dbt Model Contracts is a way to statically define the shape of your data and is defined in a YAML file, specifying the names, data types, and any additional con...

data-engineeringdbtanalytics

What is a Model Contract?

dbt Model Contracts is a way to statically define the shape of your data and is defined in a YAML file, specifying the names, data types, and any additional constraints for each column and it is enforced while the model is being built. The contract is the set of guarantees about the shape of the returned dataset. This ensures that downstream consumers of your models have a consistent and stable set of columns to use in their analyses.

Once the contract configuration is enforced, dbt cross-checks the returned dataset from your model against the attributes you have defined in your YAML file. If it encounters a discrepancy — in data types, column names, or number of columns — dbt will promptly raise an error before materializing the model as a table in the database.

Benefits of dbt Model Contracts

dbt model contracts provide a range of benefits that improve the efficiency, effectiveness, and quality of data engineering processes. These benefits include:

  • Proactive data governance: Implementing dbt model contracts encourages data teams to take a proactive approach to data governance. By defining contracts, teams can establish a clear understanding of data lineage, dependencies, and quality expectations. This helps ensure compliance with data regulations and promotes a culture of data stewardship within the organization.
  • Improved data reliability: dbt model contracts enforce strict expectations about the output structure and data types of data models, ensuring that the output data is reliable and adheres to predefined standards. This increased reliability leads to more accurate insights and better decision-making, as data consumers can trust the data generated by the models.
  • Easier debugging and error resolution: dbt model contracts simplify the debugging process by enforcing constraints during the build process. If a build fails due to a constraint violation, data engineers can quickly identify the issue and address it before it impacts downstream processes. This saves time and resources, reducing the potential impact of errors on the organization’s>Reduced costs and improved resource optimization: dbt model contracts can help reduce compute costs by replacing tests with equivalent constraints, which guarantee validation at build time. Constraints are typically more efficient than tests because they are enforced by the database and do not require additional compute resources. By optimizing resource usage, organizations can minimize costs while ensuring the quality and consistency of their data models.
  • Increased flexibility and adaptability: DBT model contracts provide a flexible mechanism for defining the shape of the returned dataset, allowing data teams to tailor the output to their specific needs. This adaptability enables organizations to respond to evolving business requirements and make changes to their data models without introducing inconsistencies or errors.
  • Streamlined collaboration and knowledge sharing: With dbt model contracts, teams can establish a clear understanding of each model’s structure, purpose, and constraints. This shared knowledge fosters better collaboration and knowledge sharing among team members, making it easier to onboard new members, address issues collectively, and optimize the overall data pipeline.
  • Enhanced change management and versioning: dbt model contracts provide a structured way to manage changes in the output of data models. By defining contracts, data teams can easily identify and manage breaking changes, enabling more effective version control and change management processes. This helps maintain the stability and integrity of data pipelines, reducing the risk of unexpected issues arising from changes in the data models.
  • Documentation and transparency: Model contracts serve as a form of documentation that outlines the expectations and structure of each model. This documentation can be invaluable for maintaining transparency within the organization and ensuring that stakeholders understand the purpose and limitations of each data model.

Model contracts are supported for SQL models, materialized as tables, views, and incremental (with on_schema_change: append_new_columns ).

dbt version should be >= 1.5 to use model contracts

Example :

Below is the model definition for customers and modified to add contract definition.

The raw file is sourced from

https://github.com/dbt-labs/jaffle_shop/blob/main/models/schema.yml

models:
 - name: customers
 description: This table has basic information about a customer, as well as some derived facts based on a customer's orders
 config:
 materialized: table
 contract:
 enforced: true

 columns:
 - name: customer_id
 description: This is a unique identifier for a customer
 data_type: int
 constraints:
 - type: not_null
 tests:
 - unique
 - not_null

 - name: first_name
 description: Customer's first name. PII.
 data_type: string

 - name: last_name
 description: Customer's last name. PII.
 data_type: string

 - name: first_order
 description: Date (UTC) of a customer's first order
 data_type: date

 - name: most_recent_order
 description: Date (UTC) of a customer's most recent order
 data_type: date

 - name: number_of_orders
 description: Count of the number of orders a customer has placed
 data_type: bigint

 - name: customer_lifetime_value
 description: Total value (AUD) of a customer's orders
 data_type: double

Let’s say your model is defined as:

https://github.com/dbt-labs/jaffle\_shop/blob/main/models/customers.sql models/dim_customers.sql

with customers as (
 select * from {{ ref('stg_customers') }}
),
orders as (
 select * from {{ ref('stg_orders') }}
),
payments as (
 select * from {{ ref('stg_payments') }}
),
customer_orders as (
 select
 customer_id,
 min(order_date) as first_order,
 max(order_date) as most_recent_order,
 count(order_id) as number_of_orders
 from orders
 group by customer_id
),
customer_payments as (
 select
 orders.customer_id,
 sum(amount) as total_amount
 from payments
 left join orders on
 payments.order_id = orders.order_id
 group by orders.customer_id
),
final as (
 select
 customers.customer_id,
 customers.first_name,
 customers.last_name,
 customer_orders.first_order,
 customer_orders.most_recent_order,
 customer_orders.number_of_orders,
 customer_payments.total_amount as customer_lifetime_value
 from customers
 left join customer_orders
 on customers.customer_id = customer_orders.customer_id
 left join customer_payments
 on customers.customer_id = customer_payments.customer_id
)
select * from final

In this case, as the column types in your SQL model don’t align with those specified in the YAML contract (line#12), DBT will raise an error. This gives you a chance to resolve the issue before it potentially affects downstream analyses.

When you dbt run -s customers your model, you will recieve below error as datatype is enforced as INT, but the end results has the definition of BIGINT which comes from referenced models.

00:35:12 Finished running 1 table model in 0 hours 0 minutes and 3.35 seconds (3.35s).
00:35:12 
00:35:12 Completed with 1 error and 0 warnings:
00:35:12 
00:35:12 Compilation Error in model customers (models/customers.sql)
00:35:12 This model has an enforced contract that failed.
00:35:12 Please ensure the name, data_type, and number of columns in your contract match the columns in your model's definition.
00:35:12 
00:35:12 | column_name | definition_type | contract_type | mismatch_reason |
00:35:12 | ----------- | --------------- | ------------- | ------------------ |
00:35:12 | customer_id | bigint | int | data type mismatch |
00:35:12 
00:35:12 
00:35:12 > in macro assert_columns_equivalent (macros/materializations/models/table/columns_spec_ddl.sql)
00:35:12 > called by macro default__get_assert_columns_equivalent (macros/materializations/models/table/columns_spec_ddl.sql)
00:35:12 > called by macro get_assert_columns_equivalent (macros/materializations/models/table/columns_spec_ddl.sql)
00:35:12 > called by macro databricks__create_table_as (macros/adapters.sql)
00:35:12 > called by macro create_table_as (macros/materializations/models/table/create_table_as.sql)
00:35:12 > called by macro statement (macros/etc/statement.sql)
00:35:12 > called by macro materialization_table_databricks (macros/materializations/table.sql)
00:35:12 > called by model customers (models/customers.sql)
00:35:12 
00:35:12 Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1

How does this compare to dbt test?

Contracts enforces data shape. Test enforces data quality.

The main difference between dbt tests and dbt model contracts is that tests are used to validate the content of a dbt model after it has been built, while contracts are used to define the shape of the returned dataset before the model is built.

Tests are more flexible than contracts and can be used to check a wide variety of things, such as the presence of certain values in a column, the relationship between different columns, or the freshness of the data. Contracts, on the other hand, are limited to validating the column names and data types of a model.

Limitations with dbt Model Contracts

The below limitation exists as of 10/Oct/2023

No Partial Contracts : dbt contracts currently require all columns in a model to be defined. Partial contracts, which would allow only a subset of columns to be defined, are not yet supported but may be introduced in the future.

Model Types: Model contracts can only be used with table or incremental models, not view or ephemeral models.


This article was originally published at https://medium.com/@aradsouza/dbt-model-contracts-e0a6a6e58c0e