03-27-2017 04:48 PM
I have a dataset that contains both character and numeric variables. I need to validate the values of both the character and numeric variables including nulls, missing values, the range of values and the outliers (outside the range of values) if there are any. If anybody can help me with a macro that I can customize to serve my purpose, I would greatly appreciate your kind assistance in this regard. Thank you very much.
03-27-2017 05:08 PM
What are the rules? I'm not understanding what you're trying to do beyond running a proc univariate and/or freq for your datasets.
Proc univariate data= have; Proc freq data=have; Run;
03-27-2017 05:12 PM
Show some example data and what you expect the result to be.
Some of this may be doable with custom formats. Here is an example with one character and one numeric variable with known expected values for the character and an expected range for the numeric.
proc format library=work; value expectnum ., 1-50='In range'; value $expectchar "Bob","Ted","Alice","Carol"='Expected'; run; data example; informat char $8. num best5.; input char num; datalines; Bob 1 Ted 77 John 43 Carol . Alice 189 ; run; proc freq data=example; tables char num / missing; format char $expectchar. num expectnum.; run;
The proc freq output will show the number of acceptable (in your defined ranges or lists) values and the individual values that are not acceptable.
If a numeric value should not be missing, then remove it from the acceptable range. Proc format has a number of ways to expess end points so you will need to determine what may work for you.
Outlier you will have to work on as what may be considered an outlier in one data set may not for another. Proc univariate may help as it will show the largest and smallest values along with distribution characteristics that may let you define outlier.