- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello Experts,
Please, I would like to know, if you have created some sas programs and templates for the data quality Dimensions : COMPLETENESS,UNIQUENESS,TIMELINESS,VALIDITY,ACCURACY,CONSISTENCY
In order to enhance the data quality reports ?
Are there any SAS implementions for the data quality dimensions as specified in the standard : BCBS 239 ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Internet searches for
SAS BCBS 239
finds lots of hits
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank very much for your answer.
As a part of the creation of the six data quality Dimensions (COMPLETENESS, UNIQUENESS, TIMELINESS, VALIDITY, ACCURACY, CONSISTENCY) via SAS, there are three components that will be involved in the process of the generation of the Data Quality Report:
- Input Design
- SAS Program
- Output (Data quality Report)
It seems clear that, there is a strong dependency between the Input Design and SAS program (to run with less coding and keeping more transparency). In order to create by SAS an automatic Data Quality report.
Consequently, are there any general solution that have been proposed for both the Input Design and SAS program?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
There is a dedicated SAS product designed to characterize/profile data quality measures. In general though, there needs to be some subject matter expertise applied, because you need to specify both the data and the rules and you need to develop a system to do so. I've had to this before by the way, building a data quality report and system to transfer data to different government entities to meet their rules. Without Data Integration studio sadly...didn't know enough at the time to have used that instead anyways!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you very much.
I am referring to the use of the Base SAS to do data quality report with the six data quality Dimensions: COMPLETENESS, UNIQUENESS, TIMELINESS, VALIDITY, ACCURACY, CONSISTENCY
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
So pick a data system and start there by first establishing the business rules around how data are collected. You then translate that to the dimensions above, ie completeness for 'name' type fields would be mean that First Name and LastName are always filled out but MiddleName can be blank.
Uniqueness means that each transaction ID should be unique for example - this may not be true in all systems, especially if you have a transaction system that allows for returns. In that case, the TransactionID and Type (purchase or return) form the unique key to be checked.
I don't recall the definitions of the other terms at the top of my head but hopefully this helps you get started.
This paper walks through some of it and you can also search on lexjansen.com for more examples:
https://support.sas.com/resources/papers/proceedings/proceedings/sugi29/093-29.pdf
Note that is not a trivial task, but can be done, it just takes some time.
There's also a SAS code book that can help you get there faster, the macro is in the library.
https://communities.sas.com/t5/SAS-Communities-Library/Library-Datasets-Summary-Macro-DATA-SPECS/ta-...
You can probably use that to help build your rules.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you very much for these useful and details information.
That’s very kind from you.
AS you said ““that is not a trivial task, but can be done, it just takes some time”
That’s right; and, in order to find a general solution, the key could be in the “input Design” that will be running in SAS Program.
As you know, the input design implemented the business rules and easily understood by the functional and Technical; and loaded with less complexity in SAS.
Therefore, in many cases, the inputs design, it is combination of three components:
- Set of fields to apply the data quality
- Data Quality Dimensions to apply: COMPLETENESS, UNIQUENESS, TIMELINESS, VALIDITY, ACCURACY, CONSISTENCY
- Set of reference variables used in the compare.
The appearance of the three components in the “input design” in a smart way could solve the issue
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
When I built similar systems I put the rules/logic in an Excel file and had the business drive the rules and I just implemented them.
Thinking about all of the different possible cases isn't trivial, which is why there's a dedicated SAS product that does this, SAS Data Quality Studio.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
As for input , That's right . Instead, the Output design could have nearly the same format ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
In order to apply the 6 Data quality dimensions on N variables(for exemple : 1000 variables),
How much time do you estimate to do this job?
Is there any method for estimating the time necessary to do this job ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I would say you would first go through and group variables together and identify which need to checked and which ones don't and which go together. Then you have the subset you need to check. I'd guess you're looking at 20/30 minutes per variable here to make decisions and ensure things are correct and not accounting for extra follow up time. Some variables may take a minute to check and verify and some, like an account type that's changed over time and has old and new types may take a few hours so 20/30 minutes on average. Usually you assume 5 productive hours a day. You can do the math from there.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you very much.
It's a good idea
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@PaigeMiller : I think, you mean this link https://blogs.sas.com/content/sascom/tag/bcbs-239/
I have read these 14 principles, I am interested by their implementation in SAS.
Thereby, could the scope of the data quality for the Banking credit risk data, defined by the banking supervisory, "became" the same as “CDEs=Critical Data elements” ? in order to limit the scope of the data quality.