DATA Step, Macro, Functions and more

Most efficient way to get missing and total count of CHAR vars?

Reply
Senior User
Posts: 1

Most efficient way to get missing and total count of CHAR vars?

If I just had numeric columns I could do something like this using proc means:

 

proc means data = x ;

    var _numeric_ ;

run ;

 

I am guessing that proc FREQ might be the best but thought I would check and see what other come up with on this board.

PROC Star
Posts: 7,363

Re: Most efficient way to get missing and total count of CHAR vars?

The following articles provides you with your answer: http://blogs.sas.com/content/iml/2011/09/19/count-the-number-of-missing-values-for-each-variable.htm...

 

Art, CEO, AnalystFinder.com

 

Super User
Posts: 9,682

Re: Most efficient way to get missing and total count of CHAR vars?

Function NMISS() work for both numeric and character variable in SQL.

 

data class;
set sashelp.class;
if _n_ in (1 4 7 9) then call missing (sex,age);
if _n_ in (10 12) then call missing (sex);
run;
proc sql;
select nmiss(sex) as n_missing_sex,
       nmiss(age) as n_missing_age
  from class;
quit;
PROC Star
Posts: 1,563

Re: Most efficient way to get missing and total count of CHAR vars?

Proc tabulate seems the fastest, and as a bonus it produces an output data set.

 

data HAVE;
  length VAR1-VAR20 $16 VAR21-VAR50 8;
  do i=1 to 1e7;
    VAR1=ifc(ranuni(0)>.5,' ','A');
    output; 
  end;
run;

proc format;
 value $missfmt ' '='Missing' other='Not Missing';
run;
 
proc freq data=HAVE;                        *  FREQ real time = cpu time = 9 seconds  ;
  format _CHAR_ $missfmt.; 
  tables _CHAR_ / missing ;
run;

proc means data=HAVE(keep=_CHAR_) missing;  * MEANS real time=7s CPU time=35s;
  format _CHAR_ $missfmt.; 
  class _CHAR_;
  ways 1;
  output out=MEANS;
run ;

proc iml;                                   *  IML real time = cpu time = 9 seconds  ;
  use HAVE;
  read all var _CHAR_ into x[colname=cNames]; 
  close HAVE;
  c = countn(x,"col");
  cmiss = countmiss(x,"col");
  rNames = {"    Missing", "Not Missing"};
  cnt = (cmiss // c) ;
  print cnt[r=rNames c=cNames label=""];
quit;

proc contents data=HAVE out=NAMES noprint;run;
data _null_;                                 * SQL real time = CPU time = 17s ;
  if _N_=1 then call execute('proc sql; select ');
  set NAMES(where =(TYPE=2)) end=LASTOBS;
  call execute(ifc(_N_ > 1, ',', ' '));
  call execute(cat('sum(missing(',NAME,')) as ',trim(NAME),'_MISS'));
  call execute(cat(',sum(^missing(',NAME,')) as ',NAME));
  if LASTOBS then call execute('from HAVE; quit;');  
run;

proc tabulate data=HAVE out=TAB missing;   * TABULATE  real time=6s CPU time=17s ;
  class _CHAR_;
  format _CHAR_ $missfmt.; 
  table _CHAR_, n;
run;

 

 

Ask a Question
Discussion stats
  • 3 replies
  • 129 views
  • 0 likes
  • 4 in conversation