Does anyone have a good way of summarize the % missing for each variable in a dataset? Alternatively, I would also want to be able to split out all the variables with over, say 20%, missing data in my dataset.
I have some macros and examples here on some different approaches.
https://gist.github.com/statgeek
EDIT: Decided to clean it up a bit, this is the one that I think meets your needs.
https://gist.github.com/statgeek/2de1faf1644dc8160fe721056202f111
/*This program creates a report with the number and percent of missing data for each variable in the data set. The ony change should be to the macro variable, INPUT_DSN. Author: F. Khurshed Date: 2019-01-04*/ *create sample data to work with; data class; set sashelp.class; if age=14 then call missing(height, weight, sex); if name='Alfred' then call missing(sex, age, height); label age="Fancy Age Label"; run; *set input data set name; %let INPUT_DSN = class; %let OUTPUT_DSN = want; *create format for missing; proc format; value $ missfmt ' '="Missing" other="Not Missing"; value nmissfmt .="Missing" other="Not Missing"; run; *Proc freq to count missing/non missing; ods select none; *turns off the output so the results do not get too messy; ods table onewayfreqs=temp; proc freq data=&INPUT_DSN.; table _all_ / missing; format _numeric_ nmissfmt. _character_ $missfmt.; run; ods select all; *Format output; data long; length variable $32. variable_value $50.; set temp; Variable=scan(table, 2); Variable_Value=strip(trim(vvaluex(variable))); presentation=catt(frequency, " (", trim(put(percent/100, percent7.1)), ")"); keep variable variable_value frequency percent cum: presentation; label variable='Variable' variable_value='Variable Value'; run; proc sort data=long; by variable; run; *make it a wide data set for presentation, with values as N (Percent); proc transpose data=long out=wide_presentation (drop=_name_); by variable; id variable_value; var presentation; run; *transpose only N; proc transpose data=long out=wide_N prefix=N_; by variable; id variable_value; var frequency; run; *transpose only percents; proc transpose data=long out=wide_PCT prefix=PCT_; by variable; id variable_value; var percent; run; *final output file; data &Output_DSN.; merge wide_N wide_PCT wide_presentation; by variable; drop _name_; label N_Missing='# Missing' N_Not_Missing='# Not Missing' PCT_Missing='% Missing' N_Not_Missing='% Not Missing' Missing='Missing' Not_missing='Not Missing'; run; title "Missing Report of &INPUT_DSN."; proc print data=&output_dsn. noobs label; run;
proc freq data=yourdata;
tables yourvariable(s) / missing;
run;
is the first thing that comes to mind. This will give a table that includes the count and percent missing for each variable on the tables statement.
Depending on other details of what you may want and variable type other options arise.
Example data and desired result do help get more targeted responses.
It is not clear what you may want next.
I have some macros and examples here on some different approaches.
https://gist.github.com/statgeek
EDIT: Decided to clean it up a bit, this is the one that I think meets your needs.
https://gist.github.com/statgeek/2de1faf1644dc8160fe721056202f111
/*This program creates a report with the number and percent of missing data for each variable in the data set. The ony change should be to the macro variable, INPUT_DSN. Author: F. Khurshed Date: 2019-01-04*/ *create sample data to work with; data class; set sashelp.class; if age=14 then call missing(height, weight, sex); if name='Alfred' then call missing(sex, age, height); label age="Fancy Age Label"; run; *set input data set name; %let INPUT_DSN = class; %let OUTPUT_DSN = want; *create format for missing; proc format; value $ missfmt ' '="Missing" other="Not Missing"; value nmissfmt .="Missing" other="Not Missing"; run; *Proc freq to count missing/non missing; ods select none; *turns off the output so the results do not get too messy; ods table onewayfreqs=temp; proc freq data=&INPUT_DSN.; table _all_ / missing; format _numeric_ nmissfmt. _character_ $missfmt.; run; ods select all; *Format output; data long; length variable $32. variable_value $50.; set temp; Variable=scan(table, 2); Variable_Value=strip(trim(vvaluex(variable))); presentation=catt(frequency, " (", trim(put(percent/100, percent7.1)), ")"); keep variable variable_value frequency percent cum: presentation; label variable='Variable' variable_value='Variable Value'; run; proc sort data=long; by variable; run; *make it a wide data set for presentation, with values as N (Percent); proc transpose data=long out=wide_presentation (drop=_name_); by variable; id variable_value; var presentation; run; *transpose only N; proc transpose data=long out=wide_N prefix=N_; by variable; id variable_value; var frequency; run; *transpose only percents; proc transpose data=long out=wide_PCT prefix=PCT_; by variable; id variable_value; var percent; run; *final output file; data &Output_DSN.; merge wide_N wide_PCT wide_presentation; by variable; drop _name_; label N_Missing='# Missing' N_Not_Missing='# Not Missing' PCT_Missing='% Missing' N_Not_Missing='% Not Missing' Missing='Missing' Not_missing='Not Missing'; run; title "Missing Report of &INPUT_DSN."; proc print data=&output_dsn. noobs label; run;
Reeza,
It would be succinct by using SQL.
data have;
set sashelp.heart;
run;
proc transpose data=have(obs=0) out=temp;
var _all_;
run;
data _null_;
set temp end=last;
if _n_=1 then call execute('proc sql;create table want as select ');
call execute(cat('nmiss(',_name_,')/count(*) as ',_name_,'
format=percent8.2'));
if last then call execute('from have;quit;');
else call execute(',');
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.