Create Data Quality Rules in Viya 4

1 Like

There are many definitions of data quality. We can say that data is generally considered high quality if it is "fit for purpose” or “fit for its intended uses” to support your processes including analytics where we aim to have a high data quality to achieve the best possible results.

We measure data quality based on different data quality characteristics like accuracy, completeness, validity and more. These data quality characteristics are called data quality dimensions.

In this blog I’m going to show how we can use SAS Intelligent Decisioning to build data quality monitoring rules.

Create data quality monitoring rules.

In Intelligent Decisioning we build the monitoring rules in three steps:

Data quality business rules
Data quality fields rules
Monitoring Tasks

Data quality business rules

The business rules describe the data quality rules for the different quality dimensions. The rules are implemented as Rules Sets in Intelligent Decisioning. If possible, the rules are written to be reusable, so that a rule can be used for different fields. For example, we can have a rule named “Completeness_String” where we check if a character field has 0 length or is null. This rule can then be used for many character fields for dimension Completeness.

We may also have rules which are specific to certain fields so that we end up having reusable as well as specific rules for different dimensions and fields.

As we use Rule Sets in Intelligent Decisioning to build the business rules we can mostly build the rules without writing code. We just “assemble” generic if-then-else statements which are not language specific at this point.

However, for more sophisticated rules we can also use SAS DS2 code within the Rule Set which offers a lot of functions out of the box like functions for regular expressions, the Quality Knowledgebase (QKB) and more to build the business rules to perform all necessary checks.

All data quality business rules that we write follow the same pattern. This means we have two input parameters which we pass through plus two additional output parameters.

The input parameters are

The Record ID for the record where we measure the data quality.
The Field Value that is being check.

The additional output parameters are

A Boolean parameter to state the result if the rule was triggered or not.
A parameter to carry the rule name. This is to identify the rule name for a fired rule when we visualize the monitoring result later in a Data Quality Dashboard.

When we have built a business rule, we can test it within Intelligent Decisioning by streaming through some test data, to ensure the rule works as desired before we use it in production.

We can also version our rules in Intelligent Decisioning which helps us to govern and maintain the rules through their lifecycle.

Data quality field rules

When we have written the data quality business rule, we are going to combine all rules that we need for one field or table column where we want to check its data quality.

To build the fields rules we are using decision flows in Intelligent Decisioning. For each field where we are going to perform quality checks, we build one decision flow. In the decision flow we call the specific quality business rules (rule sets) hat we want to check for the field. After a business rule was called, we evaluate its result. If no quality dimension rule was fired (the result status was True), we are going to call the next business rule for the next quality dimension and do the same when we check the result for this business rule. As soon as a rule for a dimension was fired, we l leave the decision flow, as we don’t want more than one dimension to fail at a time.

When we have built the field rule, we can then test the rule by streaming through some test data, to ensure it works as desired, like testing the business rule before.

And like business rules, we can version the field rules in Intelligent Decisioning to support the government and maintenance process.

Monitoring Tasks

When we have built the necessary field rules, we are going to build the monitoring tasks. Monitoring tasks describe all fields of an entity for which we are going to check the data quality. In a decision flow we combine all field rules for one entity by calling each field’s appropriate field rule.

Like with business rules and field rules we can also test the monitoring task in Intelligent Decisioning as well as version them.

Using the right monitoring task in production.

For the decision flows we can use the approver workflow that is part of Intelligent Decisioning and helps to ensure that only tested and approved monitoring tasks are going to be used on the production data.

When a new monitoring task (decision flow) is created, Intelligent Decisioning automatically starts a workflow with several stages to control that a dedicated person is involved at the right time to develop, test, and approve the monitoring task.

When a monitoring task has reached the status Deployed in the workflow it is ready to be used for measuring the data quality in production.

Summary

Intelligent Decisioning is a great tool to support you in monitoring data quality in SAS Viya. I have described one way to achieve this task but there are other ways too, depending on your requirements.

In the next post I’m going to talk about how we can run the monitoring tasks using Studio Flow in SAS Studio.