BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Melk
Lapis Lazuli | Level 10

Does anyone have a good way of summarize the % missing for each variable in a dataset? Alternatively, I would also want to be able to split out all the variables with over, say 20%, missing data in my dataset.

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

I have some macros and examples here on some different approaches. 

 

https://gist.github.com/statgeek

 

EDIT: Decided to clean it up a bit, this is the one that I think meets your needs.

 

https://gist.github.com/statgeek/2de1faf1644dc8160fe721056202f111

 

/*This program creates a report with the number and percent of
missing data for each variable in the data set.
The ony change should be to the macro variable, INPUT_DSN.

Author: F. Khurshed
Date: 2019-01-04*/
*create sample data to work with;

data class;
    set sashelp.class;

    if age=14 then
        call missing(height, weight, sex);

    if name='Alfred' then
        call missing(sex, age, height);
    label age="Fancy Age Label";
run;

*set input data set name;
%let INPUT_DSN = class;
%let OUTPUT_DSN = want;
*create format for missing;

proc format;
    value $ missfmt ' '="Missing" other="Not Missing";
    value nmissfmt .="Missing" other="Not Missing";
run;

*Proc freq to count missing/non missing;
ods select none;
*turns off the output so the results do not get too messy;
ods table onewayfreqs=temp;

proc freq data=&INPUT_DSN.;
    table _all_ / missing;
    format _numeric_ nmissfmt. _character_ $missfmt.;
run;

ods select all;
*Format output;

data long;
    length variable $32. variable_value $50.;
    set temp;
    Variable=scan(table, 2);
    Variable_Value=strip(trim(vvaluex(variable)));
    presentation=catt(frequency, " (", trim(put(percent/100, percent7.1)), ")");
    keep variable variable_value frequency percent cum: presentation;
    label variable='Variable' variable_value='Variable Value';
run;

proc sort data=long;
    by variable;
run;

*make it a wide data set for presentation, with values as N (Percent);

proc transpose data=long out=wide_presentation (drop=_name_);
    by variable;
    id variable_value;
    var presentation;
run;

*transpose only N;

proc transpose data=long out=wide_N prefix=N_;
    by variable;
    id variable_value;
    var frequency;
run;

*transpose only percents;

proc transpose data=long out=wide_PCT prefix=PCT_;
    by variable;
    id variable_value;
    var percent;
run;

*final output file;

data &Output_DSN.;
    merge wide_N wide_PCT wide_presentation;
    by variable;
    drop _name_;
    label N_Missing='# Missing' N_Not_Missing='# Not Missing' 
        PCT_Missing='% Missing' N_Not_Missing='% Not Missing' Missing='Missing' 
        Not_missing='Not Missing';
run;

title "Missing Report of &INPUT_DSN.";

proc print data=&output_dsn. noobs label;
run;

 

 

View solution in original post

6 REPLIES 6
ballardw
Super User

proc freq data=yourdata;

   tables yourvariable(s) / missing;

run;

is the first thing that comes to mind. This will give a table that includes the count and percent missing for each variable on the tables statement.

 

Depending on other details of what you may want and variable type other options arise.

Example data and desired result do help get more targeted responses.

It is not clear what you may want next.

 

Melk
Lapis Lazuli | Level 10
Is there a way to use the ods output statement to create a table with the variable name and % missing - so then I can use a data step to select only those variables that have over a certain percentage of missing data to present to the clinician? I have many variables so it is very inefficient to do it for each variable as a separate table.
Reeza
Super User

I have some macros and examples here on some different approaches. 

 

https://gist.github.com/statgeek

 

EDIT: Decided to clean it up a bit, this is the one that I think meets your needs.

 

https://gist.github.com/statgeek/2de1faf1644dc8160fe721056202f111

 

/*This program creates a report with the number and percent of
missing data for each variable in the data set.
The ony change should be to the macro variable, INPUT_DSN.

Author: F. Khurshed
Date: 2019-01-04*/
*create sample data to work with;

data class;
    set sashelp.class;

    if age=14 then
        call missing(height, weight, sex);

    if name='Alfred' then
        call missing(sex, age, height);
    label age="Fancy Age Label";
run;

*set input data set name;
%let INPUT_DSN = class;
%let OUTPUT_DSN = want;
*create format for missing;

proc format;
    value $ missfmt ' '="Missing" other="Not Missing";
    value nmissfmt .="Missing" other="Not Missing";
run;

*Proc freq to count missing/non missing;
ods select none;
*turns off the output so the results do not get too messy;
ods table onewayfreqs=temp;

proc freq data=&INPUT_DSN.;
    table _all_ / missing;
    format _numeric_ nmissfmt. _character_ $missfmt.;
run;

ods select all;
*Format output;

data long;
    length variable $32. variable_value $50.;
    set temp;
    Variable=scan(table, 2);
    Variable_Value=strip(trim(vvaluex(variable)));
    presentation=catt(frequency, " (", trim(put(percent/100, percent7.1)), ")");
    keep variable variable_value frequency percent cum: presentation;
    label variable='Variable' variable_value='Variable Value';
run;

proc sort data=long;
    by variable;
run;

*make it a wide data set for presentation, with values as N (Percent);

proc transpose data=long out=wide_presentation (drop=_name_);
    by variable;
    id variable_value;
    var presentation;
run;

*transpose only N;

proc transpose data=long out=wide_N prefix=N_;
    by variable;
    id variable_value;
    var frequency;
run;

*transpose only percents;

proc transpose data=long out=wide_PCT prefix=PCT_;
    by variable;
    id variable_value;
    var percent;
run;

*final output file;

data &Output_DSN.;
    merge wide_N wide_PCT wide_presentation;
    by variable;
    drop _name_;
    label N_Missing='# Missing' N_Not_Missing='# Not Missing' 
        PCT_Missing='% Missing' N_Not_Missing='% Not Missing' Missing='Missing' 
        Not_missing='Not Missing';
run;

title "Missing Report of &INPUT_DSN.";

proc print data=&output_dsn. noobs label;
run;

 

 

Ksharp
Super User

Reeza, 

It would be succinct by using SQL.

data have;
set sashelp.heart;
run;
proc transpose data=have(obs=0) out=temp;
var _all_;
run;
data _null_;
set temp end=last;
if _n_=1 then call execute('proc sql;create table want as select ');
call execute(cat('nmiss(',_name_,')/count(*) as ',_name_,'
format=percent8.2'));
if last then call execute('from have;quit;');
else call execute(',');
run;

 

Melk
Lapis Lazuli | Level 10
Ksharp this is very succinct and I will test it out - thank you!
Melk
Lapis Lazuli | Level 10
Thank you!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 2697 views
  • 9 likes
  • 4 in conversation