07-22-2015 11:36 PM
Please can you provide some methods/ techniques (along with sas code/ macros), which we can utilise for basic data validations and quality check purposes.
Also, let me know the process steps to follow. thanks
07-23-2015 01:32 PM
There are MANY potential validation scenarios so this is an expensive question. Books are written on this topic.
I generally start at reading the data. I can use custom informats that are associated with expected ranges and assign an OTHER = _ERROR_; When a variable is read with that format there are two immediate effects: An error message in the log and the variable value is set to missing.
So range or value checking is relatively easy.
The fun begins with forms of referential validation where a value is only acceptable based on a condition. At which point all of the conditional logic tools come such as If/then/else, Select or case structures. You have decide what the appropriate action is.
You may also have to worry about rate of change such as how likely is it that outside air temperature changes 50 degrees within an hour?
We also then may have to deal with "value is in absolute range but is some kind of outlier".
And last are things like checking compliance with "business rules" such as If X happens then Y should happen within/no sooner than some time interval.
If fraud is potential then there are a number of things that are done to look at entire distributions of values based on location/dealer/user/customer.