Help using Base SAS procedures

Counting Missing Values

Reply
Occasional Contributor
Posts: 8

Counting Missing Values

Hi

Hope you can help.

I have a dataset that has about 800 variables and 100,000 rows.  I want to find the number of missing values for each variable (column).  I don't want a proc frequency table for each of the variables.  I would like a table output with one column for variable and another column for the number of missing values for each variable.

eg.

variable name          # missing values

var 1                               3000

var 2                               80

var 3                              10000

...

var 800                    5

Thanks

Bo

Super User
Posts: 17,770

Counting Missing Values

In two steps using a proc mean and then transposing the results. This assumes you have no character variables. If you do, then a diff solution is needed.

proc means data=sashelp.class noprint;

var age height weight;

output out=try1 (drop= _type_ _freq_) nmiss=;

run;

proc transpose data=try1 out=try2 prefix=nmiss;

run;

Valued Guide
Posts: 765

Re: Counting Missing Values

Hey Art ... pretty sure  PROC MEANS only accepts numeric variables in the VAR statement.  I get this with your example ...

183  proc means data=have noprint;

184    var _all_;

ERROR: Variable var2 in list does not match type prescribed for this list.

185    output out=temp (drop= _type_ _freq_) nmiss=;

186  run;

How about (assumes there is a variable that uniquely identifies each observation ... here it's NREC) ...

data x;

input name : $10. age zip : $5. salary;

nrec+1;

datalines;

mike 64 12203 9999

sara . 20001 .

. 24 20001 10000

jessica . 90124 999

;

run;

options missing='';

proc transpose data=x out=new (where=(missing(col1)) keep=_name_ col1);

by nrec;

var _all_;

run;

options missing='.';

proc freq data=new;

table _name_ / nocum nopercent;

run;

_NAME_    Frequency

age              2

name             1

salary           1


PROC Star
Posts: 7,357

Re: Counting Missing Values

Mike,

You are absolutely, totally correct and I am deleting my previous response.  You would think after all of these years I'd have learned to look at my log before posting Smiley Happy

Valued Guide
Posts: 765

Re: Counting Missing Values

Hi ... I guess it's "log before you leap".

Valued Guide
Posts: 765

Re: Counting Missing Values

Hi ... closer to the data set in the original posting ... took about 25 seconds CPU (elapsed about the same) on a not too fast PC.

* fake data ... 100,000 obs / 600 numeric + 200 character vars

data x;

array nn(600);

array cc(200) $2.;

do id=1 to 1e5;

do k=1 to 200;

   nn(k) = ifn(ranuni(999) le .05 , . , 10);

   cc(k) = ifc(ranuni(999) le .05 , ' ' , '10');

end;

do k=201 to 600;

   nn(k) = ifn(ranuni(999) le .05 , . , 10);

end;

output;

end;

keep id nn: cc: ;

run;

options missing='';

proc transpose data=x out=new (where=(missing(col1)) keep=_name_ col1);

by id;

var _all_;

run;

options missing='.';

proc freq data=new order=freq;

table _name_ / nocum nopercent;

run;

some results ...

_NAME_    Frequency

nn314         5266

cc198         5230

nn124         5213

nn511         5185

cc182         5178

nn392         5169

nn3           5159


Frequent Contributor
Posts: 104

Re: Counting Missing Values

Building off of MikeZdeb's solution, just wanted to see the performance trade-off between PROC TRANSPOSE and the good old DATA STEP.  This was run on an old Win XP machine with PC SAS 9.1.3 (stop laughing!) with 10,000 records and 800 variables...

21   options missing='';

22   proc transpose data=x out=new (where=(missing(col1)) keep=_name_ col1);

23       by id;

24       var _all_;

25   run;

NOTE: Numeric variables in the input data set will be converted to character in the output data set.

NOTE: There were 10000 observations read from the data set WORK.X.

NOTE: The data set WORK.NEW has 399281 observations and 2 variables.

NOTE: PROCEDURE TRANSPOSE used (Total process time):

      real time           2.42 seconds

      cpu time            2.37 seconds

26

27   data new1(keep=__var);

28       set x;

29       array __num(*) _numeric_;

30       array __chr(*) _character_;

31

32       length __var $ 32;

33

34       do i=1 to hbound(__num,1);

35           if missing(__num(i)) then do;

36               __var = vname(__num(i));

37               output;

38           end;

39       end;

40

41       do i=1 to hbound(__chr,1);

42           if missing(__chr(i)) then do;

43               __var = vname(__chr(i));

44               output;

45           end;

46       end;

47   run;

NOTE: There were 10000 observations read from the data set WORK.X.

NOTE: The data set WORK.NEW1 has 399281 observations and 1 variables.

NOTE: DATA statement used (Total process time):

      real time           0.38 seconds

      cpu time            0.39 seconds

DATA step runs considerably faster, but took more time to code.  Is it clearer or more confusing than PROC TRANSPOSE is a matter of personal taste.

Regular Contributor
Posts: 184

Re: Counting Missing Values

Here's a PROC FREQ approach that does only one pass through the data.

proc format ;

value  miss_n low-high = 'N' other = 'Y' ;

value $miss_c      ' ' = 'Y' other = 'N' ;

run ;

ods listing close ;

ods output OneWayFreqs=owf ;

proc freq data=x nlevels ;

format _numeric_    miss_n.

       _character_ $miss_c. ;

tables _all_ / missing ;

run ;

ods output close ;

ods listing ;

data want (keep = Table Frequency) ;

set owf ;

if cats(of F_Smiley Happy EQ 'Y' ;

table = substr( table , index(table,' ')+1 ) ;

run ;

Super User
Posts: 9,671

Re: Counting Missing Values

It is very interesting. Although in the documentation , it said that nmiss() function is counted for numeric value,But I

found nmiss() is validated for both numeric and character in SQL.

data temp;
 set sashelp.class;
 if ranuni(0) le .4 then call missing(weight);
 if ranuni(0) ge .6 then call missing(name);
run;
proc sql noprint;
 select 'nmiss('||strip(name)||') as '||strip(name) 
  into : list separated by ','
   from dictionary.columns
    where libname='WORK' and memname='TEMP';

 create table have as
  select &list
   from temp;
quit;

proc transpose data=have out=want prefix=missing_value;
run;

Ksharp

Occasional Contributor
Posts: 8

Counting Missing Values

Hi All,

Thanks a lot fo the replies, especially MikeZDeb - that was realy useful!

Trusted Advisor
Posts: 1,300

Counting Missing Values

proc format;

invalue blahfmt (fuzz=.09999999999)

  .1 = .A

  .2 = .B

  .3 = .C

  .4 = .D

  .5 = .E

  other = .;

run;

data _null_;

nmiss=0;

do i=1 to 1000;

  blah=input(ranuni(1234),blahfmt.);

  if blah=. then nmiss+1;

end;

putlog nmiss=;

run;

PROC Star
Posts: 7,357

Re: Counting Missing Values

I would have thought that Howard's offering would be the quickest, but it ended up (at least in my tests) to be the slowest.  The winner, I think, is Ksharp's proc sql offering and by a fairly large margin.

Ask a Question
Discussion stats
  • 11 replies
  • 520 views
  • 3 likes
  • 8 in conversation