Overview
Data is the lifeblood of good decision-making, but it’s like food: if it’s rotten, it can make you sick. That’s where data quality comes in. It’s all about making sure our data is accurate, consistent, and reliable.
As data grows , it becomes more and more difficult to keep up to date with new data models and sources, leading to issues like:
- Metrics calculated incorrectly due to misunderstanding of business or simple error on calculation
- Ambiguous column/table naming. We assume a field means something it doesn’t. Attributes are not easy to discover, causing column duplication
- Documentation on the models not upto date leading to inconsistent understanding and confusions
Ensuring data quality is crucial because inaccuracies in the data ecosystem can have far-reaching implications. Incorrect data not only erodes trust in the organizational data but also undermines efforts to foster a>But not all hope is lost, we can still do something to prevent these problems before they arise and that’s where dbt-checkpoint can come to our rescue.
dbt-checkpoint provides pre-commit hooks to ensure the quality of your dbt projects.
dbtis awesome, but when the number of models, sources, and macros in a project grows, it becomes challenging to maintain the same level of quality across developers.. Users forget to update columns in property(yml) files or add table and column add descriptions. Without automation the reviewer workload increases and unintentional errors may be missed. dbt-checkpoint allows organizations to add automated validations improving your code review and release process.
Deep dive into dbt-checkpoint
Install
For detailed installation and usage, instructions see pre-commit.com site.
pip install pre-commit
Setup
- Create a file named
.pre-commit-config.yamlin your project root folder. - Add list of hooks you want to run before every commit. E.g.:
repos:
- repo: https://github.com/dbt-checkpoint/dbt-checkpoint
rev: v1.2.0
hooks:
- id: check-script-semicolon
- id: check-script-has-no-table-name
- id: dbt-test
- id: dbt-docs-generate
- id: check-model-has-all-columns
name: Check columns - core
files: ^models/core
- Optionally, run
pre-commit installto set up the git hook scripts. With this,pre-commitwill run automatically ongit commit! You can also manually runpre-commit runafter youstageall files you want to run. Orpre-commit run --all-filesto run the hooks against all of the files (not onlystaged).
Housekeeping for pre-commit
pre-commit autoupdate
This command is also useful if you do not know what version is appropriate for the hooks you wish to use. If in doubt. Set rev: 0.0.1 and run pre-commit autoupdate.
This article was originally published at https://medium.com/@aradsouza/dbt-checkpoint-improve-data-quality-9e07fd4f0fa4