BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Xiaoningdemao
Quartz | Level 8

Hi

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
KachiM
Rhodochrosite | Level 12

Do you want to take another way to get what you want? This way may help you to get more information of the observations.

 

Be brave to do some programming with arrays.

 

data want;
   set sashelp.class end = eof;
   N + 1;
   array num[*] _numeric_;
   array char[*] _character_;
   array n_count[100] _temporary_ (100*0); * Assuming data set has no more than 100 OBS;
   array c_count[100] _temporary_ (100*0);
   do i = 1 to dim(num);
      if missing(num[i]) then n_count[i] + 1;
   end;
   do i = 1 to dim(char);
      if missing(char[i]) then c_count[i] + 1;
   end;
   if eof then do;
      do i = 1 to dim(num);
         Variable = vname(num[i]);
         NMiss = n_count[i];
         output;
      end;
      do i = 1 to dim(char);
         Variable = vname(char[i]);
         NMiss = c_count[i];
         output;
      end;
   end;
keep Variable N NMiss;
run;

View solution in original post

11 REPLIES 11
ballardw
Super User

Character variables in Proc Tabulate may only be CLASS variables. Because of the default that any record with a missing value for a class variable is excluded you would need to add the missing option. The only statistics you may request for character variables in Proc Tabulate are N and the various N-related such a Pctn, ColPctN, RowPctn, PagePctn and RepPctn.

The following code adds a separate table to your output that has each value of the character var and its count.

 

proc tabulate data=sashelp.class f=comma6.0;
   title ;
   var _numeric_;
   table _numeric_,
      nmiss n/ box='Variable';
    class _character_ / missing;
    table _character_,n;
run;

 

If you want a simplier count you might want to go through your data and add a numeric flag for when the character variable isn't missing and count (or sum) that. Though there isn't going be a short easy way to reference the character variable counted.

Xiaoningdemao
Quartz | Level 8
thank you your help!!! But the result looks like a proc freq for each character variable, but I just want a count of non-missing, instead add them up manually do you have other solution? since the data set I deal with now have almost 1000 variables, so go through them one by one is not very realistic....
KachiM
Rhodochrosite | Level 12

Do you want to take another way to get what you want? This way may help you to get more information of the observations.

 

Be brave to do some programming with arrays.

 

data want;
   set sashelp.class end = eof;
   N + 1;
   array num[*] _numeric_;
   array char[*] _character_;
   array n_count[100] _temporary_ (100*0); * Assuming data set has no more than 100 OBS;
   array c_count[100] _temporary_ (100*0);
   do i = 1 to dim(num);
      if missing(num[i]) then n_count[i] + 1;
   end;
   do i = 1 to dim(char);
      if missing(char[i]) then c_count[i] + 1;
   end;
   if eof then do;
      do i = 1 to dim(num);
         Variable = vname(num[i]);
         NMiss = n_count[i];
         output;
      end;
      do i = 1 to dim(char);
         Variable = vname(char[i]);
         NMiss = c_count[i];
         output;
      end;
   end;
keep Variable N NMiss;
run;
SuzanneDorinski
Lapis Lazuli | Level 10

You can use PROC SQL to get the count of missing values for character or numeric variables.  A coworker asked how to find all the variables in a data set that are missing for all observations.  The code example below can answer that question.  

 

options mprint;

/* Using SASHELP data sets while testing macro. 
   Macro should list variables that are 
   completely missing in data set, so need to
   create some data sets with such variables.  */

data baseball;
  set sashelp.baseball;
  suzanne_favorite_team='';
  cleveland_indians_rank=.;
run;

data class;
  set sashelp.class;
  teacher='';
  grade_level=.;
run;

%macro missing_summary(lib=, dsn=);

  /* use dummy macro so that syntax highlighting shows up within missing macro */

  %macro dummy();  
  %mend dummy;

  /* get data set variable names */

  proc sql;
    create table vars_to_summarize as
	  select name,
	         label
	    from sashelp.vcolumn
		  where libname="%upcase(&lib)" and
		        memname="%upcase(&dsn)";
  quit;

  /* count number of variables to process */

  proc sql noprint;
    select count(*) into
	  :var_counter
	     from vars_to_summarize;
  quit;

  /* do PROC SQL one variable at a time,
     append details for variable to summary
     report, then delete the variable's data
     set so that work library doesn't get 
     cluttered. */

  %do i=1 %to &var_counter;

    data _null_;
      obsnum=&i;
      set vars_to_summarize point=obsnum;
      if _error_ then abort;
      call symputx('field',name);
	  call symputx('description',label);
      stop; 
    run;

    proc sql;
      create table &field._report as
        select count(*) as n,
               count(&field) as not_missing,
	           nmiss(&field) as missing
          from &lib..&dsn;
    quit; 

    data &field._report;
      length variable $ 32 dataset $ 41 label $ 256;
	  set &field._report;
	  variable="&field";
      dataset="&lib..&dsn";
	  label="&description";
    run;
  
    proc append base=summary data=&field._report;
    run;

    proc datasets nolist;
      delete &field._report;
    quit;

  %end;

  proc datasets nolist;
    delete vars_to_summarize;
  quit;

%mend missing_summary;

%missing_summary(lib=work, dsn=class);
%missing_summary(lib=work, dsn=baseball);

proc print data=summary noobs;
  by dataset notsorted;
 * uncomment statement below if you only want to see the variables missing for ALL obs ;
 * where n=missing;   
  var dataset variable label n not_missing missing;
  title 'Missing and not-missing counts by variable';
run;

* clean up;

proc datasets nolist;
  delete summary;
quit;

 

References:

 

Top 10 Most Powerful Functions for PROC SQL

 

Dictionary Tables and Views: Essential Tools for Serious Applications

 

Xiaoningdemao
Quartz | Level 8
Dear @SuzanneDorinski,

Thank you so much for working on this problem and wrote such a nice code with all comments on! But the data set i'm dealing with is very large, this code takes a lot time(about 10 minutes). Sorry I didn't accept this as solution......
Since i am a beginner in SAS, i did learn a lot from your code. Thanks again!!

Best,
Xiaoningdemao
Quartz | Level 8
This is perfect! I got exactly what I want!! Just to double check with you, I just need to change all the "100" in this code to the number of obs in my data set, right? @KachiM
ballardw
Super User

@Xiaoningdemao wrote:
This is perfect! I got exactly what I want!! Just to double check with you, I just need to change all the "100" in this code to the number of obs in my data set, right? @KachiM

Ideally the 100 would be replaced with the number of numeric variables for the numeric array and the number of character variables for the character array.

 

Here is an example code snippet to get the counts of char and num variables.:

proc sql;
   select type, count(*) as n
   from dictionary.columns 
      where libname='SASHELP' and memname='CLASS'
   group by type;
quit;
Xiaoningdemao
Quartz | Level 8
Hi @ballardw,
Thank you!!
But I meant the code you fist provided me produced a proc freq of each character variable. Result like this:
N
Gender
11,972
male 1220
female 30

But the value I wanted is the number of nonmissing values in this example 1220+30=1250.

I'm wondering is there a solution to get this value?

thanks again!!

Best,

ballardw
Super User

For character variables add a variable to count or sum depending on how you are summarizing.

data want;

    set have;

    GenderNotMiss= (not missing(gender)); /* creates a 1 for not missing, 0 for missing values of gender*/

run;

 

Then in proc tabulate request SUM of GenderNotMiss;

or: if not missing(gender0 then GenderNotMiss=1;

and you could use the NMISS in tabulate along with the other numerics.

I would suggest using a good label with these indicator variables.

 

KachiM
Rhodochrosite | Level 12

'100' in my code represents total number of VARIABLES in the data set and NOT the number of observations as I wrote on the comments. The array will work even with 10 Million Variables. It just provides memory space to hold that many variables but in reality it will much less and the counting is done for as many NUMERIC and CHARACTER Variables.

 

If you know the numbers of numeric variables (say 15) and character variables (say 20) before running the data step, you can size the arrays as: 

   array n_count[15] _temporary_ (15*0); * Assuming data set has no more than 100 OBS;
   array c_count[20] _temporary_ (20*0);

Hope this helps you. 

Xiaoningdemao
Quartz | Level 8
OK, I see. Thank you!!! @KachiM

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 14546 views
  • 4 likes
  • 4 in conversation