DATA Step, Macro, Functions and more

A SAS macro to validate values of character and numeric variables

Reply
Occasional Contributor
Posts: 10

A SAS macro to validate values of character and numeric variables

Hi there

 

I have a dataset that contains both character and numeric variables. I need to validate the values of both the character and numeric variables including nulls, missing values, the range of values and the outliers (outside the range of values) if there are any.  If anybody can help me with a macro that I can customize to serve my purpose, I would greatly appreciate your kind assistance in this regard. Thank you very much.

 

Chandra

Super User
Posts: 17,942

Re: A SAS macro to validate values of character and numeric variables

What are the rules? I'm not understanding what you're trying to do beyond running a proc univariate and/or freq for your datasets. 

 

Proc univariate data= have;
Proc freq data=have;
Run;
Super User
Posts: 10,548

Re: A SAS macro to validate values of character and numeric variables

Show some example data and what you expect the result to be.

 

Some of this may be doable with custom formats. Here is an example with one character and one numeric variable with known expected values for the character and an expected range for the numeric.

 

proc format library=work;
value expectnum
., 1-50='In range';
value $expectchar
"Bob","Ted","Alice","Carol"='Expected';
run;

data example;
   informat char $8. num best5.;
   input char num;
datalines;
Bob 1
Ted 77
John 43
Carol .
Alice 189
;
run;

proc freq data=example;
   tables char num / missing;
   format char $expectchar. num expectnum.;
run;

 

The proc freq output will show the number of acceptable (in your defined ranges or lists) values and the individual values that are not acceptable.

 

If a numeric value should not be missing, then remove it from the acceptable range. Proc format has a number of ways to expess end points so you will need to determine what may work for you.

 

Outlier you will have to work on as what may be considered an outlier in one data set may not for another. Proc univariate may help as it will show the largest and smallest values along with distribution characteristics that may let you define outlier.

Ask a Question
Discussion stats
  • 2 replies
  • 120 views
  • 0 likes
  • 3 in conversation