This seems like it should be quite easy to do, but I haven't found a way yet. I have a large file (millions of records) with names, dates of birth, addresses, etc, and I want to know the missing/non-missing percentages for each of these variables. I don't want to run proc freq, as this will give me a huge list of every name in the dataset.
My work-around so far has been to recode these types of variables into dichotomous variables (ie, 0=missing and 1=non-missing), and then run freqs. But that's time consuming and tedious. Isn't there an option in proc freq or another procedure to do this automatically?
Hello @Wolverine,
@Wolverine wrote:
I don't want to run proc freq, as this will give me a huge list of every name in the dataset.
This doesn't happen if you apply a suitable format to the variables in question such as
proc format;
value miss
._-.z = 'missing'
other = 'non-missing';
value $miss
' ' = 'missing'
other = 'non-missing';
run;
proc freq data=sashelp.heart;
format _numeric_ miss. _char_ $miss.;
tables _all_ / missing;
run;
Hello @Wolverine,
@Wolverine wrote:
I don't want to run proc freq, as this will give me a huge list of every name in the dataset.
This doesn't happen if you apply a suitable format to the variables in question such as
proc format;
value miss
._-.z = 'missing'
other = 'non-missing';
value $miss
' ' = 'missing'
other = 'non-missing';
run;
proc freq data=sashelp.heart;
format _numeric_ miss. _char_ $miss.;
tables _all_ / missing;
run;
Here's a variant on the program from @FreelanceReinh that cleans up the display into a few nice options.
/*This program creates a report with the number and percent of
missing data for each variable in the data set.
The ony change should be to the macro variable, INPUT_DSN.
Author: F. Khurshed
Date: 2019-01-04
*/
*create sample data to work with;
data class;
set sashelp.class;
if age=14 then
call missing(height, weight, sex);
if name='Alfred' then
call missing(sex, age, height);
label age="Fancy Age Label";
run;
*set input data set name;
%let INPUT_DSN = class;
%let OUTPUT_DSN = want;
*create format for missing;
proc format;
value $ missfmt ' '="Missing" other="Not Missing";
value nmissfmt .="Missing" other="Not Missing";
run;
*Proc freq to count missing/non missing;
ods select none;
*turns off the output so the results do not get too messy;
ods table onewayfreqs=temp;
proc freq data=&INPUT_DSN.;
table _all_ / missing;
format _numeric_ nmissfmt. _character_ $missfmt.;
run;
ods select all;
*Format output;
data long;
length variable $32. variable_value $50.;
set temp;
Variable=scan(table, 2);
Variable_Value=strip(trim(vvaluex(variable)));
presentation=catt(frequency, " (", trim(put(percent/100, percent7.1)), ")");
keep variable variable_value frequency percent cum: presentation;
label variable='Variable' variable_value='Variable Value';
run;
proc sort data=long;
by variable;
run;
*make it a wide data set for presentation, with values as N (Percent);
proc transpose data=long out=wide_presentation (drop=_name_);
by variable;
id variable_value;
var presentation;
run;
*transpose only N;
proc transpose data=long out=wide_N prefix=N_;
by variable;
id variable_value;
var frequency;
run;
*transpose only percents;
proc transpose data=long out=wide_PCT prefix=PCT_;
by variable;
id variable_value;
var percent;
run;
*final output file;
data &Output_DSN.;
merge wide_N wide_PCT wide_presentation;
by variable;
drop _name_;
label N_Missing='# Missing' N_Not_Missing='# Not Missing'
PCT_Missing='% Missing' N_Not_Missing='% Not Missing' Missing='Missing'
Not_missing='Not Missing';
run;
title "Missing Report of &INPUT_DSN.";
proc print data=&output_dsn. noobs label;
run;
Change the macro variables in the code and re-run it on your own data set to get your own results.
It should be pretty fast.
*change these to what you need in code above;
%let INPUT_DSN = class;
%let OUTPUT_DSN = want;
@Wolverine wrote:
This seems like it should be quite easy to do, but I haven't found a way yet. I have a large file (millions of records) with names, dates of birth, addresses, etc, and I want to know the missing/non-missing percentages for each of these variables. I don't want to run proc freq, as this will give me a huge list of every name in the dataset.
My work-around so far has been to recode these types of variables into dichotomous variables (ie, 0=missing and 1=non-missing), and then run freqs. But that's time consuming and tedious. Isn't there an option in proc freq or another procedure to do this automatically?
%let lib=sashelp; %let dsn=heart; data _null_; set sashelp.vcolumn(where=(libname=upcase("&lib") and memname=upcase("&dsn"))) end=last; if _n_=1 then call execute('proc sql;create table want as select '); call execute(cat("nmiss(",name,")/(select count(*) from &lib..&dsn ) as ",name," format=percent8.2")); if last then call execute("from &lib..&dsn ;quit;" ); else call execute(','); run;
I accepted FreelanceReinhard's solution because it's simple and easy to apply, and it serves my purposes quite well for this particular data. But I will make a note of Reeza's macro, as I anticipate it could be useful in the future for larger/more complex datasets. Thanks!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.