Statistical programming, matrix languages, and more

How do I count the missing values across all variables in a dataset and set those counts to dataset

Reply
Occasional Contributor
Posts: 18

How do I count the missing values across all variables in a dataset and set those counts to dataset

Hi All,

 

Can you please help me with the following query. My goal is to count the number of non-missing values and stack those counts to original dataset. Since the counts are all generated in numeric format, when I stack the counts to original dataset it throws an error when it encounters character variable.

 

data final;
set sashelp.electric;
run;

 

ods output cnt=cnt;
proc iml;
use final;
read all var _NUM_ into x[colname=nNames];
n = countn(x,"col");
nmiss = countmiss(x,"col");

read all var _CHAR_ into x[colname=cNames];
close final;
c = countn(x,"col");
cmiss = countmiss(x,"col");

/* combine results for num and char into a single table */
Names = cNames || nNames;
rNames = {" Missing", "Not Missing"};
cnt = (cmiss // c) || (nmiss // n);

print cnt[r=rNames c=Names label=""];
ods trace off;

 

data final;
set final cnt;
run;

 

Super User
Posts: 22,874

Re: How do I count the missing values across all variables in a dataset and set those counts to data

Is this question in a course or SAS tutorial online somewhere?

 

It has been asked and answered recently, so search on here. 

I think this is what you're looking for:

https://communities.sas.com/t5/SAS-IML-Software-and-Matrix/Send-this-IML-output-to-dataset-instead-o...

 

 

And if you genuinely need a way to count missing, PROC FREQ works faster and is easier. 

https://gist.github.com/statgeek/2de1faf1644dc8160fe721056202f111

SAS Super FREQ
Posts: 4,124

Re: How do I count the missing values across all variables in a dataset and set those counts to data

Nice job counting the missing/nonmissing observations, but you cannot append a numeric value to a character variable.  Your choices are 

1. Keep the statistics separate from the data (which I recommend)

2. Convert the numbers to character values, which is probably not a good idea.

 

If you decide to choose (2), use the CHAR function to convert from numeric to character values, as follows:

 

proc iml;
....

cnt1 = char(cmiss // c);  /* Yuck! Convert to character */
cnt2 = (nmiss // n);

create cnt1 from cnt1[c=cNames r=rNames];
append from cnt1[r=rNames];
close;
create cnt2 from cnt2[c=nNames r=rNames];
append from cnt2[r=rNames];
close;

data cnt;
merge cnt1 cnt2;
by rNames;
run;

data final;
length rNames $11;
set final cnt;
run;
Trusted Advisor
Posts: 1,294

Re: How do I count the missing values across all variables in a dataset and set those counts to data

Once a variable type is specified as character, you can't treat it as numeric selectively for specific rows.  What you can do for those variables is convert the numeric counts of non-missing to characters, and right justify the results.  Will character representations of numeric values be of use to you?

 

 

Even though this is an IML forum, here's a DATA step solution:

 

data final;
set sashelp.electric;
run;


data want (drop=i);
  set final end=eof;
  array nvar {*} _numeric_;
  array cvar {*} _character_;

  array n_n {100} _temporary_;
  array n_c {100} _temporary_;

  do i=1 to dim(nvar);
      n_n{i}+n(nvar{i});
  end;
  do i=1 to dim(cvar);
      n_c{i}+(cvar{i}^=' ');
  end;
  output;
  if eof then do;
    do i=1 to dim(nvar); nvar{i}=n_n{i}; end;
    do i=1 to dim(cvar); cvar{i}=put(n_n{i},8.);cvar{i}=right(cvar{i});end;
    output;
  end;
run;

 

 

Occasional Contributor
Posts: 18

Re: How do I count the missing values across all variables in a dataset and set those counts to data

Hi All,

I am working on a clinical data where I have to check about 100 datasets in each study and the variables differ on each dataset. I di try using PROC FREQ , but due to the size of the dataset and number of variables associated in each dataset, it is taking long time and when searching for the work around solution I did get that piece of code(provided by Rick) and wanted to tweak it as per my need. But wasn't able to successfully convert the character variables into character and reached out to the forum.
Super User
Super User
Posts: 7,860

Re: How do I count the missing values across all variables in a dataset and set those counts to data

PROC FREQ shouldn't need much memory to count missings.

You should first define some formats to categorize the values in misisng or non-mising.

 

proc sql ;
 value miss low-high='Non-Missing' other='Missing';
 value $miss ' '='Missing' other='Non-Missing;
run;

You can then use PROC FREQ.  You can use the ODS OUTPUT option to collect the statistics.

proc freq data=have ;
  format _numeric_ miss. _character_ $miss. ;
  tables _all_;
run;

Or use PROC SUMMARY instead using the CLASS statement.

proc summary data=have missing chartype ;
  format _numeric_ miss. _character_ $miss. ;
  class _all_;
  ways 1 ;
  output out=want n=count;
run;

 

Super User
Posts: 10,623

Re: How do I count the missing values across all variables in a dataset and set those counts to data


data final;
set sashelp.electric;
run;

proc iml;
use final nobs nobs;
read all var _NUM_ into x[colname=nNames]; 
nmiss = countmiss(x,"col");
n=nobs-nmiss;

read all var _CHAR_ into y[colname=cNames]; 
cmiss = countmiss(y,"col");
c=nobs-cmiss;
close final;

Names = cNames || nNames;
rNames = {" Missing", "Not Missing"};
cnt = (cmiss // c) || (nmiss // n);

create want from cnt[r=rnames c=names];
append from cnt[r=rnames];
close;
quit;
Ask a Question
Discussion stats
  • 6 replies
  • 306 views
  • 0 likes
  • 6 in conversation