BookmarkSubscribeRSS Feed
knveraraju91
Barite | Level 11

Dear,

I am validating a data set. I want to check for any missing values and possible values  and outliers for each variables in the data set.

I am running proc freq for each variable. I just want to know any better way to do it  as my lab data set  has more than million obs and 100 variables.

 

data one;
input a b $3 c$5;
datalines;
1 a b
2   c
3   
4 b d
5 c e
6 d f
7   g
8 e
;
proc freq;
tables a;
run;

proc freq;
tables b;
run;

proc freq;
tables c;
run;

3 REPLIES 3
art297
Opal | Level 21

Why not just use:

data one;
  input a b $3 c$5;
  datalines;
1 a b
2   c
3   
4 b d
5 c e
6 d f
7   g
8 e
;
proc freq;
  tables _all_;
run;

Art, CEO, AnalystFinder.com

 

ballardw
Super User

If you have variables that are more of a measurement than a category such as height, weight, monetary values (price/ sale and similar), you might want to look at them in terms of means, deviations and extremes, such as Proc Univariate provides as with "millions of observations" you might find yourself attempting to look through 100,000s of lines in a single table.

 

Or one approach if you are not concerned about any specific single record's value but know the expected or desired range of values is to create custom formats of "valid" range of data values for specific variables. Example:

 

proc format library=work;
value expectedage 
   13 - 16='Valid'
   other = 'Invalid';
run;

proc freq data=sashelp.class;
   tables age;
   format age expectedage.;
run;

Which at least makes output tables easier to interpret in some forms.

 

PaigeMiller
Diamond | Level 26

Note that in the above replies, no macros are needed. Don't resort to macros unless you are convinced that there is no non-macro solution available. 

--
Paige Miller

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 10863 views
  • 7 likes
  • 4 in conversation