Just feel curious. Thanks. If you have, can you post it here?
You could just use a simple datastep. e.g.:
/*create a test file with some missing data*/
data have;
set sashelp.class;
if _n_ in (5,10,12) then do;
call missing(height,weight);
end;
call missing(age);
run;
data want (keep=percent_missing);
set have end=last;
array nums _numeric_;
array chars _character_;
do over nums;
if missing(nums) then m+1;
else n+1;
end;
do over chars;
if missing(chars) then m+1;
else n+1;
end;
if last then do;
percent_missing=100*m/sum(m,n);
output;
end;
run;
proc print data=want;
run;
Ruth,
I am sure there must be better approaches, I am here just to start the discussion. following code will only work one variable at a time, but by tweaking it, it could also work on multiple variables.
DATA have;
infile cards;
input a @@;
cards;
1 . 2 . 3 4 5 6 7 . 8 . 9
;
%macro mpcnt (dat=, var=);
data _null_;
set &dat end=last nobs=nobs;
if missing(&var) then _n+1;
if last then do;
_p= _n/nobs;
put "percent=" _p percent7.2 ;
end;
run;
%mend;
%mpcnt (dat=have,var=a)
Or for SQL solution,
%macro sql_pcnt (dat=, var=);
proc sql;
select sum(missing(&var))/count(*) as Missing_percent format percent7.2
from &dat;
quit;
%mend;
%sql_pcnt (dat=have,var=a)
Hope you can have some taste of the logic and work out some solution for your real life data.
Regards,
Haikuo
Ruth,
I imagine there's not much out there in the way of macros, because SAS handles the problem easily enough without macros.
If you can work with the number (instead of percentage) of missing values, it's as easy as this:
proc means data=mydata maxdec=0 nmiss;
run;
If you really must have percentages, then create an output data set:
proc means data=mydata noprint;
var _numeric_;
output out=missings (drop=_type_) nmiss=;
run;
In your output data set MISSINGS, each variable name will contain the number of observations that the original variable was missing. You will also have the automatic variable _FREQ_, which is the total number of observations. If you bring this into a DATA step, it will be easy enough to calculate the percentages.
Good luck.
You can create a format for missing based on numeric and character variables and run proc freq.
proc format;
value $ missfmt ' '="Missing"
other="Not Missing"
;
value nmissfmt . ="Missing"
other="Not Missing"
;
run;
data have;
set have;
format _numeric_ nmissfmt. _character_ $missfmt.;
run;
proc freq data=have;
run;
But here's my macro. It needs work since I chopped it together from another macro I use, so not all the code in here is actually required.
%macro sum_missing(libname, dsetin, dsetout);
*Delete old dataset;
proc datasets nodetails nolist;
delete &dsetout;
quit;
*Upcase all macro variables to have consistency;
data _null_;
call symput ("libname", put(upcase("&libname."), $8.));
call symput ("dsetin", put(upcase("&dsetin."), $32.));
run;
*Formats for missing for character and numeric variables;
proc format;
value $ missfmt ' '="Missing"
other="Not Missing"
;
value nmissfmt . ="Missing"
other="Not Missing"
;
run;
*Copy dataset and apply missing formats to it;
data step1;
set &libname..&dsetin;
format _character_ $missfmt. _numeric_ nmissfmt.;
run;
options mprint symbolgen;
*Get variable list;
proc sql noprint;
select name into : var_list separated by " "
from dictionary.columns
where upcase(libname) ="&libname"
and upcase(memname)="&dsetin";
*Get count of total number of observations into macro variable;
select count(*) into : obs_count
from &libname..&dsetin.;
quit;
*Start looping through the variable list here;
%let i=1;
%do %while (%scan(&var_list, &i, " ") ^=%str());
%let var=%scan(&var_list, &i, " ");
*Get format of missing;
data _null_;
set step1;
call symput("var_fmt", vformat(&var));
call symput("var_label", vlabel(&var));
run;
*Get count of missing;
proc freq data=step1 noprint;
table &var/missing out=cross_tab1;
run;
data cross_tab2;
set cross_tab1;
length variable $50.;
category=put(&var, &var_fmt.);
variable="&var_label.";
if _n_=1 and category='Not Missing' then do;;
Number_Missing=&obs_count-count;
Percent_Missing=Number_Missing/&obs_count.;
percent=percent/100;
output;
end;
else if _n_=1 and category='Missing' then do;
Number_Missing=count;
Percent_Missing=percent/100;
output;
end;
format percent: percent10.1;
keep variable Number_Missing Percent_Missing;
run;
proc append base=&dsetout data=cross_tab2 force;
run;
proc datasets nodetails nolist;
delete cross_tab1 cross_tab2;
run; quit;
*Increment counter;
%let i=%eval(&i+1);
%end; *Categorical;
proc datasets nodetails nolist;
delete step1;
run; quit;
%mend;
A few good answers now. Thanks.
At the same time, can I ask if a program can automatically create a new temporary dataset which reports the percentage of missing values for all variables? Probably some use of the dictionary is needed.
You could just use a simple datastep. e.g.:
/*create a test file with some missing data*/
data have;
set sashelp.class;
if _n_ in (5,10,12) then do;
call missing(height,weight);
end;
call missing(age);
run;
data want (keep=percent_missing);
set have end=last;
array nums _numeric_;
array chars _character_;
do over nums;
if missing(nums) then m+1;
else n+1;
end;
do over chars;
if missing(chars) then m+1;
else n+1;
end;
if last then do;
percent_missing=100*m/sum(m,n);
output;
end;
run;
proc print data=want;
run;
Art, Wow! I did not know there is such a thing called 'do over'. I was thinking using proc content, and get the counts of numeric and character variable into macro variables. Much appreciated!
Haikuo
Just be alerted that some users have said that we shouldn't use that form of a loop because it is no longer documented.
I happen to like it and, unlike functions, etc. that have never been documented, this one has been, but isn't in the current documentation.
Or SQL.
/*create a test file with some missing data*/ data have; set sashelp.class; if _n_ in (5,10,12) then do; call missing(height,weight); end; call missing(age); run; proc sql ; select name into : list separated by ',' from dictionary.columns where libname='WORK' and memname='HAVE'; select nvar into : nvar from dictionary.tables where libname='WORK' and memname='HAVE'; select count(*) into : total from have; select sum(cmiss(&list))/(&nvar * &total) as percent from have; quit;
Ksharp
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.