BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
chennupriya
Quartz | Level 8

Hi ,

I wanted to check percentage of missing values in a dataset by variable Category . I have used below code from other forum it worked and now i wanted same by variable  Category (which has values A,b,C). 

proc format;

value $ missfmt ' '="Missing"

other="Not Missing"

;

value nmissfmt . ="Missing"

other="Not Missing"

;

run;

 

data have;

     set have;

format _numeric_ nmissfmt. _character_ $missfmt.;

run;

 

proc freq data=have;

run;

 

 

 

 

But here's my macro. It needs work since I chopped it together from another macro I use, so not all the code in here is actually required.

 

 

%macro sum_missing(libname, dsetin, dsetout);

 

 

*Delete old dataset;

proc datasets nodetails nolist;

    delete &dsetout;

quit;

 

*Upcase all macro variables to have consistency;

data _null_;

    call symput ("libname", put(upcase("&libname."), $8.));

    call symput ("dsetin", put(upcase("&dsetin."), $32.));

run;

 

*Formats for missing for character and numeric variables;

 

proc format;

value $ missfmt ' '="Missing"

other="Not Missing"

;

value nmissfmt . ="Missing"

other="Not Missing"

;

run;

 

 

 

 

*Copy dataset and apply missing formats to it;

data step1;

    set &libname..&dsetin;

    format _character_ $missfmt. _numeric_ nmissfmt.;

run;

 

options mprint symbolgen;

 

*Get variable list;

proc sql noprint;

    select name  into : var_list separated by " "

    from dictionary.columns

    where upcase(libname) ="&libname"

    and upcase(memname)="&dsetin";

 

    *Get count of total number of observations into macro variable;

    select count(*) into : obs_count

    from &libname..&dsetin.;

quit;

 

 

*Start looping through the variable list here;

%let i=1;

%do %while (%scan(&var_list, &i, " ") ^=%str());

%let var=%scan(&var_list, &i, " ");   

 

*Get format of missing;

data _null_;

set step1;

call symput("var_fmt", vformat(&var));

call symput("var_label", vlabel(&var));

run;

 

*Get count of missing;

proc freq data=step1 noprint;

    table &var/missing out=cross_tab1;

run;

 

 

data cross_tab2;

    set cross_tab1;

 

    length variable $50.;

 

    category=put(&var, &var_fmt.);

    variable="&var_label.";

   

    if _n_=1 and category='Not Missing' then do;;

 

        Number_Missing=&obs_count-count;

        Percent_Missing=Number_Missing/&obs_count.;

 

        percent=percent/100;

        output;

    end;

 

    else if _n_=1 and category='Missing' then do;

       

        Number_Missing=count;

        Percent_Missing=percent/100;

        output;

    end;

 

        format percent: percent10.1;

 

    keep variable Number_Missing Percent_Missing;

run;

 

proc append base=&dsetout data=cross_tab2 force;

run;

 

 

proc datasets nodetails nolist;

    delete cross_tab1 cross_tab2;

run; quit;

 

*Increment counter;

%let i=%eval(&i+1);

%end; *Categorical;

 

proc datasets nodetails nolist;

    delete step1;

run; quit;

 

 

%mend;

 

Can anyone please help ?

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

Of course, modify the macro or code to handle a grouping variable. This does mean customizing the code.

 

This can get you started but I don't have time to finish it up or generalize it at this point.

 

data class;
set sashelp.class;
if age in (13, 14) then call missing(name, weight, height);
run;
proc format;

value $ missfmt ' '="Missing"

other="Not Missing"

;

value nmissfmt . ="Missing"

other="Not Missing"

;

run;

 
ods output crosstabfreqs=summary;
proc freq data=class;
table sex*(name weight height age) / missing;
format _numeric_ nmissfmt. name $missfmt.;
run;




data long;
	length variable $32. variable_value $50.;
	set summary;
	
	if missing(sex) then delete;
	
	Variable=scan(table, 2, '*');
	Variable_Value=strip(trim(vvaluex(variable)));
	presentation=catt(frequency, " (", trim(put(percent/100, percent7.1)), ")");
	keep sex variable  variable_value frequency percent presentation;
	label variable='Variable' variable_value='Variable Value';
run;

 

 

View solution in original post

23 REPLIES 23
PaigeMiller
Diamond | Level 26

What is wrong with the code you are using?

 

Why does this have to be a macro anyway? The non-macro code at the top of your message seems like it will work on any arbitrary data set.

--
Paige Miller
chennupriya
Quartz | Level 8
Hi ,
I wanted to check how do i apply the code to check percentage missing of all variables by Category A Category B and Category C

Thanks
Reeza
Super User
Your macro seems very similar to mine....including the name....https://gist.github.com/statgeek/c3a9ddcb002c469e9d61

I've seen modified my approach as well and now use the one illustrated here. You'd need to change it to handle a two way table, since the original is designed for one way tables but that's easy if you understand what's happening.

https://gist.github.com/statgeek/2de1faf1644dc8160fe721056202f111
chennupriya
Quartz | Level 8
Thank you so much for the reply . Below is the output .But can i get same layout by variable Sex ?

variable N_Missing N_Not Missing PCT_Missing PCT_Not Missing Missing Not Missing
Age 1 18 5.26 94.74 1 ( 5.3%) 18 ( 94.7%)
Height 4 15 21.05 78.95 4 ( 21.1%) 15 ( 78.9%)
Name 19 100.00 19 ( 100%)
Sex 4 15 21.05 78.95 4 ( 21.1%) 15 ( 78.9%)
Weight 4 15 21.05 78.95 4 ( 21.1%) 15 ( 78.9%)
Reeza
Super User

Of course, modify the macro or code to handle a grouping variable. This does mean customizing the code.

 

This can get you started but I don't have time to finish it up or generalize it at this point.

 

data class;
set sashelp.class;
if age in (13, 14) then call missing(name, weight, height);
run;
proc format;

value $ missfmt ' '="Missing"

other="Not Missing"

;

value nmissfmt . ="Missing"

other="Not Missing"

;

run;

 
ods output crosstabfreqs=summary;
proc freq data=class;
table sex*(name weight height age) / missing;
format _numeric_ nmissfmt. name $missfmt.;
run;




data long;
	length variable $32. variable_value $50.;
	set summary;
	
	if missing(sex) then delete;
	
	Variable=scan(table, 2, '*');
	Variable_Value=strip(trim(vvaluex(variable)));
	presentation=catt(frequency, " (", trim(put(percent/100, percent7.1)), ")");
	keep sex variable  variable_value frequency percent presentation;
	label variable='Variable' variable_value='Variable Value';
run;

 

 

chennupriya
Quartz | Level 8
Thank you so much for all your help
Watts
SAS Employee
proc freq data=have nlevels;
     by category;
     tables _all_ / missing;
run;
ballardw
Super User

@chennupriya wrote:
Hi ,
I wanted to check how do i apply the code to check percentage missing of all variables by Category A Category B and Category C

Thanks

What is a Category A (or Category B and Category C)? Is a variable named "Category" that sometimes has values of A?

In which case your requirement is very unclear as when Category=A it cannot be missing.

 

Or are we supposed to identify which variables are involved because they sometimes have values of A, B or C?

 

Or do you have a document somewhere that says variables AAA, PDQ and XYZ represent category A?

 

 

chennupriya
Quartz | Level 8
wanted percenatges of XX , YY by Category

XX YY Category
100 . A
200 20 A
300 30 B
400 . C
500 . C

ballardw
Super User

That could look something like:

 

Proc tabulate data=have;
class category;
class xx yy/ missing;
format xx yy nmissfmt. ; tables category,
(xx yy)*rowpctn
/
; run;

The output table would have a row for each value of Category and a column for "Missing" and "Not missing" for XX and YY and a percentage, The ROWPCTN is a percentage of the "row", the values of XX and YY separately. The format makes two columns. The default column heading will include the statistic ROWPCTN. Tabulate supports other percentages such as PCTN, percent of all the row/col combinations of variables crossing, COLPCTN which would be the Category percentage for each value of XX or YY.

 

Note: By default Proc Tabulate will remove any record with a missing value for any of the CLASS variables. So we have two class statements. If Category has missing values the records would be removed and if you want them you would add the / missing option to keep them in the table as well.

chennupriya
Quartz | Level 8
Thank you so much for the reply . I have tried this what if i have 100's of variables and not just three and can also get the count along with the percentage ?
ballardw
Super User

You can use variable lists on CLASS, Format and Tables statements.

 

proc tabulate data =have;

   class _numeric_ _character_ /missing;

   format _numeric_ nmissfmt _character_ $missfmt. ;

   tables _numeric_ _character_ ,

             category*(n colpctn)

             /

    ;

run;

 

This changes the row/column order from the first because if you have "100's" of variables then across is extremely likely to be very hard to use.

This will create a combination of category with itself.

There are a number of statistics that you can request. That is what the documentaiton is for.

 

Actually if this had been my data I likely would have created indicator variables with a value of 1 for not missing and then 0 for missing (or vice versa) and then use statistics Sum (total is number of 1's) and Mean(decimal percentage of 1's).

 

But very few of my data sets have 100's of variables that need checking.

chennupriya
Quartz | Level 8
Hi ,
I ran above code but i get below error

24 proc tabulate data =test;
25 class _numeric_ _character_ /missing;
26 format _numeric_ nmissfmt _character_ $missfmt. ;
ERROR: You are trying to use the character format $MISSFMT with the numeric variable XX in data set WORK.TEST.
ERROR: You are trying to use the character format $MISSFMT with the numeric variable YY in data set WORK.TEST.
WARNING: Variable NMISSFMT not found in data set WORK.TEST.
27 tables _numeric_ _character_ ,
28 category*(n rowpctn)
29 /
30 ;
31 run;
Tom
Super User Tom
Super User

Because you left the period out of the format specification instead of trying to apply two different formats to two different lists of variables you are trying to apply one format to all of the variable, including one that does not exist.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 23 replies
  • 2218 views
  • 11 likes
  • 7 in conversation