DATA Step, Macro, Functions and more

count missing value for character variable

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 76
Accepted Solution

count missing value for character variable

Hi

 

 

 

 


Accepted Solutions
Solution
‎07-14-2016 02:42 PM
Super Contributor
Posts: 298

Re: count missing value for character variable

Posted in reply to Xiaoningdemao

Do you want to take another way to get what you want? This way may help you to get more information of the observations.

 

Be brave to do some programming with arrays.

 

data want;
   set sashelp.class end = eof;
   N + 1;
   array num[*] _numeric_;
   array char[*] _character_;
   array n_count[100] _temporary_ (100*0); * Assuming data set has no more than 100 OBS;
   array c_count[100] _temporary_ (100*0);
   do i = 1 to dim(num);
      if missing(num[i]) then n_count[i] + 1;
   end;
   do i = 1 to dim(char);
      if missing(char[i]) then c_count[i] + 1;
   end;
   if eof then do;
      do i = 1 to dim(num);
         Variable = vname(num[i]);
         NMiss = n_count[i];
         output;
      end;
      do i = 1 to dim(char);
         Variable = vname(char[i]);
         NMiss = c_count[i];
         output;
      end;
   end;
keep Variable N NMiss;
run;

View solution in original post


All Replies
Super User
Posts: 11,343

Re: count missing value for character variable

Posted in reply to Xiaoningdemao

Character variables in Proc Tabulate may only be CLASS variables. Because of the default that any record with a missing value for a class variable is excluded you would need to add the missing option. The only statistics you may request for character variables in Proc Tabulate are N and the various N-related such a Pctn, ColPctN, RowPctn, PagePctn and RepPctn.

The following code adds a separate table to your output that has each value of the character var and its count.

 

proc tabulate data=sashelp.class f=comma6.0;
   title ;
   var _numeric_;
   table _numeric_,
      nmiss n/ box='Variable';
    class _character_ / missing;
    table _character_,n;
run;

 

If you want a simplier count you might want to go through your data and add a numeric flag for when the character variable isn't missing and count (or sum) that. Though there isn't going be a short easy way to reference the character variable counted.

Frequent Contributor
Posts: 76

Re: count missing value for character variable

thank you your help!!! But the result looks like a proc freq for each character variable, but I just want a count of non-missing, instead add them up manually do you have other solution? since the data set I deal with now have almost 1000 variables, so go through them one by one is not very realistic....
Solution
‎07-14-2016 02:42 PM
Super Contributor
Posts: 298

Re: count missing value for character variable

Posted in reply to Xiaoningdemao

Do you want to take another way to get what you want? This way may help you to get more information of the observations.

 

Be brave to do some programming with arrays.

 

data want;
   set sashelp.class end = eof;
   N + 1;
   array num[*] _numeric_;
   array char[*] _character_;
   array n_count[100] _temporary_ (100*0); * Assuming data set has no more than 100 OBS;
   array c_count[100] _temporary_ (100*0);
   do i = 1 to dim(num);
      if missing(num[i]) then n_count[i] + 1;
   end;
   do i = 1 to dim(char);
      if missing(char[i]) then c_count[i] + 1;
   end;
   if eof then do;
      do i = 1 to dim(num);
         Variable = vname(num[i]);
         NMiss = n_count[i];
         output;
      end;
      do i = 1 to dim(char);
         Variable = vname(char[i]);
         NMiss = c_count[i];
         output;
      end;
   end;
keep Variable N NMiss;
run;
Frequent Contributor
Posts: 89

Re: count missing value for character variable

[ Edited ]

You can use PROC SQL to get the count of missing values for character or numeric variables.  A coworker asked how to find all the variables in a data set that are missing for all observations.  The code example below can answer that question.  

 

options mprint;

/* Using SASHELP data sets while testing macro. 
   Macro should list variables that are 
   completely missing in data set, so need to
   create some data sets with such variables.  */

data baseball;
  set sashelp.baseball;
  suzanne_favorite_team='';
  cleveland_indians_rank=.;
run;

data class;
  set sashelp.class;
  teacher='';
  grade_level=.;
run;

%macro missing_summary(lib=, dsn=);

  /* use dummy macro so that syntax highlighting shows up within missing macro */

  %macro dummy();  
  %mend dummy;

  /* get data set variable names */

  proc sql;
    create table vars_to_summarize as
	  select name,
	         label
	    from sashelp.vcolumn
		  where libname="%upcase(&lib)" and
		        memname="%upcase(&dsn)";
  quit;

  /* count number of variables to process */

  proc sql noprint;
    select count(*) into
	  :var_counter
	     from vars_to_summarize;
  quit;

  /* do PROC SQL one variable at a time,
     append details for variable to summary
     report, then delete the variable's data
     set so that work library doesn't get 
     cluttered. */

  %do i=1 %to &var_counter;

    data _null_;
      obsnum=&i;
      set vars_to_summarize point=obsnum;
      if _error_ then abort;
      call symputx('field',name);
	  call symputx('description',label);
      stop; 
    run;

    proc sql;
      create table &field._report as
        select count(*) as n,
               count(&field) as not_missing,
	           nmiss(&field) as missing
          from &lib..&dsn;
    quit; 

    data &field._report;
      length variable $ 32 dataset $ 41 label $ 256;
	  set &field._report;
	  variable="&field";
      dataset="&lib..&dsn";
	  label="&description";
    run;
  
    proc append base=summary data=&field._report;
    run;

    proc datasets nolist;
      delete &field._report;
    quit;

  %end;

  proc datasets nolist;
    delete vars_to_summarize;
  quit;

%mend missing_summary;

%missing_summary(lib=work, dsn=class);
%missing_summary(lib=work, dsn=baseball);

proc print data=summary noobs;
  by dataset notsorted;
 * uncomment statement below if you only want to see the variables missing for ALL obs ;
 * where n=missing;   
  var dataset variable label n not_missing missing;
  title 'Missing and not-missing counts by variable';
run;

* clean up;

proc datasets nolist;
  delete summary;
quit;

 

References:

 

Top 10 Most Powerful Functions for PROC SQL

 

Dictionary Tables and Views: Essential Tools for Serious Applications

 

Frequent Contributor
Posts: 76

Re: count missing value for character variable

Posted in reply to SuzanneDorinski
Dear @SuzanneDorinski,

Thank you so much for working on this problem and wrote such a nice code with all comments on! But the data set i'm dealing with is very large, this code takes a lot time(about 10 minutes). Sorry I didn't accept this as solution......
Since i am a beginner in SAS, i did learn a lot from your code. Thanks again!!

Best,
Frequent Contributor
Posts: 76

Re: count missing value for character variable

This is perfect! I got exactly what I want!! Just to double check with you, I just need to change all the "100" in this code to the number of obs in my data set, right? @datasp
Super User
Posts: 11,343

Re: count missing value for character variable

Posted in reply to Xiaoningdemao

Xiaoningdemao wrote:
This is perfect! I got exactly what I want!! Just to double check with you, I just need to change all the "100" in this code to the number of obs in my data set, right? @datasp

Ideally the 100 would be replaced with the number of numeric variables for the numeric array and the number of character variables for the character array.

 

Here is an example code snippet to get the counts of char and num variables.:

proc sql;
   select type, count(*) as n
   from dictionary.columns 
      where libname='SASHELP' and memname='CLASS'
   group by type;
quit;
Frequent Contributor
Posts: 76

Re: count missing value for character variable

Hi @ballardw,
Thank you!!
But I meant the code you fist provided me produced a proc freq of each character variable. Result like this:
N
Gender
11,972
male 1220
female 30

But the value I wanted is the number of nonmissing values in this example 1220+30=1250.

I'm wondering is there a solution to get this value?

thanks again!!

Best,

Super User
Posts: 11,343

Re: count missing value for character variable

Posted in reply to Xiaoningdemao

For character variables add a variable to count or sum depending on how you are summarizing.

data want;

    set have;

    GenderNotMiss= (not missing(gender)); /* creates a 1 for not missing, 0 for missing values of gender*/

run;

 

Then in proc tabulate request SUM of GenderNotMiss;

or: if not missing(gender0 then GenderNotMiss=1;

and you could use the NMISS in tabulate along with the other numerics.

I would suggest using a good label with these indicator variables.

 

Super Contributor
Posts: 298

Re: count missing value for character variable

Posted in reply to Xiaoningdemao

'100' in my code represents total number of VARIABLES in the data set and NOT the number of observations as I wrote on the comments. The array will work even with 10 Million Variables. It just provides memory space to hold that many variables but in reality it will much less and the counting is done for as many NUMERIC and CHARACTER Variables.

 

If you know the numbers of numeric variables (say 15) and character variables (say 20) before running the data step, you can size the arrays as: 

   array n_count[15] _temporary_ (15*0); * Assuming data set has no more than 100 OBS;
   array c_count[20] _temporary_ (20*0);

Hope this helps you. 

Frequent Contributor
Posts: 76

Re: count missing value for character variable

OK, I see. Thank you!!! @datasp
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 11 replies
  • 3583 views
  • 3 likes
  • 4 in conversation