BookmarkSubscribeRSS Feed
LineMoon
Lapis Lazuli | Level 10

Hello Experts,

Please, I would like to know, if you have created some sas programs and templates for the data quality Dimensions : COMPLETENESS,UNIQUENESS,TIMELINESS,VALIDITY,ACCURACY,CONSISTENCY

In order to enhance the data quality reports ?

Are there any SAS implementions for the data quality dimensions as specified in the standard :  BCBS 239 ?

 

13 REPLIES 13
PaigeMiller
Diamond | Level 26

Internet searches for

 

SAS BCBS 239

 

finds lots of hits

--
Paige Miller
LineMoon
Lapis Lazuli | Level 10

Thank very much for your answer.

As a part of the creation of the six data quality Dimensions (COMPLETENESS, UNIQUENESS, TIMELINESS, VALIDITY, ACCURACY, CONSISTENCY) via SAS,  there are three components that will be involved in the process of the generation of the Data Quality Report:

 

  • Input Design
  • SAS Program
  • Output (Data quality Report)

 

It seems clear that, there is a strong dependency between the Input Design and SAS program (to run with less coding and keeping more transparency). In order to create by SAS an automatic Data Quality report.

 Consequently, are there any general solution that have been proposed for both the Input Design and SAS program?

Reeza
Super User
For starters are you referring to a specific product, such as SAS Data Quality Studio or Data Integration Studio? Or are you referring to using Base SAS to do data quality management? Given your last question, I'm assuming the latter and my further comments are based on this assumption.

There is a dedicated SAS product designed to characterize/profile data quality measures. In general though, there needs to be some subject matter expertise applied, because you need to specify both the data and the rules and you need to develop a system to do so. I've had to this before by the way, building a data quality report and system to transfer data to different government entities to meet their rules. Without Data Integration studio sadly...didn't know enough at the time to have used that instead anyways!
LineMoon
Lapis Lazuli | Level 10

Thank you very much.

I am  referring to the use of the  Base SAS to do data quality report with the six data quality Dimensions: COMPLETENESS, UNIQUENESS, TIMELINESS, VALIDITY, ACCURACY, CONSISTENCY

Reeza
Super User
Ok, so yeah, you need to define your rules and write your programs. They will vary based on input. For example, for some of our data the timeliness is 24 hours and for others its 30 days. You can only know that by knowing the business rules.

So pick a data system and start there by first establishing the business rules around how data are collected. You then translate that to the dimensions above, ie completeness for 'name' type fields would be mean that First Name and LastName are always filled out but MiddleName can be blank.
Uniqueness means that each transaction ID should be unique for example - this may not be true in all systems, especially if you have a transaction system that allows for returns. In that case, the TransactionID and Type (purchase or return) form the unique key to be checked.

I don't recall the definitions of the other terms at the top of my head but hopefully this helps you get started.

This paper walks through some of it and you can also search on lexjansen.com for more examples:
https://support.sas.com/resources/papers/proceedings/proceedings/sugi29/093-29.pdf

Note that is not a trivial task, but can be done, it just takes some time.

There's also a SAS code book that can help you get there faster, the macro is in the library.

https://communities.sas.com/t5/SAS-Communities-Library/Library-Datasets-Summary-Macro-DATA-SPECS/ta-...

You can probably use that to help build your rules.
LineMoon
Lapis Lazuli | Level 10

Thank you very much for these useful and details information.

That’s very kind from you.

AS you said ““that is not a trivial task, but can be done, it just takes some time”

 That’s right; and, in order to find a general solution, the key could be in the “input Design” that will be running in SAS Program.

As you know, the input design implemented the business rules and easily  understood by the functional and Technical; and loaded with less complexity in SAS.

Therefore, in many cases, the inputs design, it is combination of three components:

  • Set of fields to apply the data quality
  • Data Quality Dimensions to apply: COMPLETENESS, UNIQUENESS, TIMELINESS, VALIDITY, ACCURACY, CONSISTENCY
  • Set of reference variables used in the compare.

The appearance of the three components in the “input design” in a smart way could solve the issue   

Reeza
Super User
Yes, but that also takes time and you need to know the rules that could even come into play. My DQ programs for clinical trials and health care were different from the data quality rules when I was n agriculture. But similar idea.

When I built similar systems I put the rules/logic in an Excel file and had the business drive the rules and I just implemented them.

Thinking about all of the different possible cases isn't trivial, which is why there's a dedicated SAS product that does this, SAS Data Quality Studio.

LineMoon
Lapis Lazuli | Level 10

As for input , That's right . Instead, the Output design could have nearly the same format ?

Reeza
Super User
Sure it could.
LineMoon
Lapis Lazuli | Level 10

In order to apply the 6 Data quality dimensions on N variables(for exemple : 1000 variables),

How much time do you estimate to do this job? 

Is there any method for estimating the time necessary to do this job ? 

 

Reeza
Super User
Depends on the variables. If 25 are diagnosis codes for example, you can look at them all together, and there's probably similar functionality in banking data. For example, all amount fields can be looked at together.

I would say you would first go through and group variables together and identify which need to checked and which ones don't and which go together. Then you have the subset you need to check. I'd guess you're looking at 20/30 minutes per variable here to make decisions and ensure things are correct and not accounting for extra follow up time. Some variables may take a minute to check and verify and some, like an account type that's changed over time and has old and new types may take a few hours so 20/30 minutes on average. Usually you assume 5 productive hours a day. You can do the math from there.
LineMoon
Lapis Lazuli | Level 10

Thank you very much.

It's a good idea 

LineMoon
Lapis Lazuli | Level 10

@PaigeMiller : I think, you mean this link https://blogs.sas.com/content/sascom/tag/bcbs-239/

I have read these 14 principles, I am interested by their  implementation in SAS.

Thereby, could the scope of the data quality for the Banking credit risk data, defined by the banking supervisory,  "became"  the same as “CDEs=Critical Data elements” ? in order to limit the scope of the data quality.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 1461 views
  • 6 likes
  • 3 in conversation