DATA Step, Macro, Functions and more

How to create a table to hold the missing value count for multiple columns?

Reply
Regular Contributor
Posts: 189

How to create a table to hold the missing value count for multiple columns?

Hi,

 

I am trying to pass the list of variables to a macro and find if these variables have missing values and create a table to get the count of missing values for each variable into a column in a table . the column types include both character and numeric

 

 With the below code I am only able to get the count for second/last variable.

 

Can you please help me out with this. It will be great if you can suggest an efficient way.

 

Appreciate any help.

 

Thanks in advance,

Regards,

Sheeba

%macro test(dsn,string);                                                                                                                   

  %let word_cnt=%sysfunc(countw(%superq(string))); 

%put &word_cnt;

  %do i = 1 %to &word_cnt;                                                                                                             

    %let var&i=%qscan(%superq(string),&i,%str( ));                                                                                      

    %put &&var&i;                                                                                                                      

          

%end;

 

proc sql;

create table miss_val as

select

sum(case when &&var&i is null  then 1 else 0 end ) as miss_cnt_var_&i,

count(*) as total_rows

from &dsn;

 

%mend test;  

                   

%test(ing_raw_total,sequence_number id)

Super User
Posts: 23,998

Re: How to create a table to hold the missing value count for multiple columns?

If you search on here, there was a good PROC TABULATE answer yesterday and I show an inefficient solution here

 

https://gist.github.com/statgeek/c3a9ddcb002c469e9d61

 

And a more efficient one here:

https://gist.github.com/statgeek/2de1faf1644dc8160fe721056202f111

 

This last one is probably exactly what you want.

 


Sheeba wrote:

Hi,

 

I am trying to pass the list of variables to a macro and find if these variables have missing values and create a table to get the count of missing values for each variable into a column in a table . the column types include both character and numeric

 

 With the below code I am only able to get the count for second/last variable.

 

Can you please help me out with this. It will be great if you can suggest an efficient way.

 

Appreciate any help.

 

Thanks in advance,

Regards,

Sheeba

%macro test(dsn,string);                                                                                                                   

  %let word_cnt=%sysfunc(countw(%superq(string))); 

%put &word_cnt;

  %do i = 1 %to &word_cnt;                                                                                                             

    %let var&i=%qscan(%superq(string),&i,%str( ));                                                                                      

    %put &&var&i;                                                                                                                      

          

%end;

 

proc sql;

create table miss_val as

select

sum(case when &&var&i is null  then 1 else 0 end ) as miss_cnt_var_&i,

count(*) as total_rows

from &dsn;

 

%mend test;  

                   

%test(ing_raw_total,sequence_number id)


 

Regular Contributor
Posts: 189

Re: How to create a table to hold the missing value count for multiple columns?

Thanks a lot for the response Reeza.

 

I will go through the links.

 

Regards,

Sheeba

 

Super User
Posts: 6,931

Re: How to create a table to hold the missing value count for multiple columns?

As you have seen, your code generates a single CASE statement.  You need a macro that will generate many CASE statements (one for each variable).  If you had such a macro, here is what the macro call might look like:

 

proc sql;

create table miss_val as

select 

%test (ing_raw_total, sequence_number id)

count(*) as total_rows

from &dsn;

quit;

 

Here's a way to define a macro that can generate all those CASE statements.

 

%macro test (dsn, string);

  %local i var;

  %do i=1 %to %sysfunc(countw(&string));

      %let var = %scan(&string, &i);

      sum(case when &var is null then 1 else 0 end) as miss_cnt_var_&i,

  %end;

%mend test;

 

 

Obviously, this is untested code.  But it should be right or nearly right.  We might need to see how SAS interprets the word NULL.

Regular Contributor
Posts: 189

Re: How to create a table to hold the missing value count for multiple columns?

Posted in reply to Astounding

Hi Astounding,

 

Thanks a lot for the reply.

 

I will try it out and let you know the results.

 

Regards,

Sheeba

Super User
Posts: 13,926

Re: How to create a table to hold the missing value count for multiple columns?

You were only generating a case statement with the last value of the loop &i as your case is outside the loop.

 

You want to create a separate CASE statement for each variable. Perhaps:

 

%macro test(dsn,string);                                                                                                                    
   %let word_cnt=%sysfunc(countw(%superq(string)));  
   /*%put &word_cnt; */
   %do i = 1 %to &word_cnt;                                                                                                              
    %let var&i=%qscan(%superq(string),&i,%str( ));                                                                                      
   /* %put &&var&i;                                                                                                                       */
   %end;

   proc sql;
      create table miss_val as 
      select
      %do i=1 %to &word_cnt; 
         sum(case when &&var&i is null  then 1 else 0 end ) as miss_cnt_var_&i,
      %end;
      count(*) as total_rows
      from &dsn;
   quit; 

%mend test; 

Note that when working with macros it is a very good idea to always use Run; or Quit; as appropriate for each procedure. If you don't you may have a time come when one of your data step or proc calls does not execute because the next like of code is interpreted as being part of the current step.

 

 

You might want to consider learning about the macro debugging options MPRINT and SYMBOLGEN. Set those with an options statement prior to executing your macro and you can see more of what the macro code is creating. Reset the options with an options Nomprint No symbolgen;

Regular Contributor
Posts: 189

Re: How to create a table to hold the missing value count for multiple columns?

Hi Ballardw,

 

Thanks a lot for the reply.

 

I will try this code and update you. I will set the options and see how its getting resolved.

 

Thanks again,

Regards,

Sheeba

Regular Contributor
Posts: 189

Re: How to create a table to hold the missing value count for multiple columns?

Hi All,

 

I just came across this option as well.

 

proc means data=ing_raw_fmt_sp NMISS N; run;

 

Do you think this approach would be useful for a table with both character and numeric columns... does it have limitations ?

 

Thanks,

 

Regards,

Sheeba

Super User
Posts: 6,931

Re: How to create a table to hold the missing value count for multiple columns?

It's good, it's easy, but it only applies to numeric variables.  If you want to switch gears from PROC SQL, there are plenty of ways to count missing vs. non-missing.  The original post made it look like you wanted to debug the macro code but that's not the only way to get the end result.

Regular Contributor
Posts: 189

Re: How to create a table to hold the missing value count for multiple columns?

Posted in reply to Astounding

Thanks a lot for the reply.

 

My table has a mix of character and numeric data .

 

Regards,

Sheeba

Super User
Posts: 23,998

Re: How to create a table to hold the missing value count for multiple columns?


Sheeba wrote:

Hi All,

 

I just came across this option as well.

 

proc means data=ing_raw_fmt_sp NMISS N; run;

 

Do you think this approach would be useful for a table with both character and numeric columns... does it have limitations ?

 

Thanks,

 

Regards,

Sheeba


Proc means only handles numeric variables.  PROC FREQ with the formats in the links I posted above handles both character and numeric with no issues. 

Regular Contributor
Posts: 189

Re: How to create a table to hold the missing value count for multiple columns?

Thanks a lot for the reply Reeza.

 

yes, I am able to get the output using proc freq.

 

Thanks again,

 

Regards,

Sheeba

Ask a Question
Discussion stats
  • 11 replies
  • 186 views
  • 5 likes
  • 4 in conversation