I have a large survey data set with hundreds of variables, both numeric and character. Each variable can take two missing values, M or N. I want to make a list that shows the percent of M values for each variable. I can define formats to collapse all the variables into M and not-M, but I'm stuck at producing the list. With this many variables, hand-coding is out of the question. I thought to make an output dataset with PROC FREQ, but that allows only one frequency distribution per output dataset. The regular output from PROC FREQ contains all the headers, which I don't want to have to edit out. I've also tried PROC TABULATE, but I can't work out how to get it to do what I want. I could set up a macro to loop through each variable, run PROC FREQ, and concatenate the output datasets. However, I feel like there must be a smack-in-the-head-of-course method that I'm missing. What is it?
I want a list like this:
Variable Name Percent M.
--Dav
" both numeric and character. Each variable can take two missing values, M or N."
Show us the sample data and the output you are looking for .
data have;
input a $ b;
cards;
M 12
c .M
x .N
n .M
;
run;
data _null_;
set sashelp.vcolumn(keep=libname memname name type
where=(upcase(libname)='WORK' and upcase(memname)='HAVE')) end=last;
if _n_=1 then call execute('proc sql;create table temp as select ');
if type='char' then call execute(cat('sum(',name,'="M")/count(*) as ',name));
else call execute(cat('sum(',name,'=.M)/count(*) as ',name));
if last then call execute('from have;quit;');
else call execute(',');
run;
proc transpose data=temp out=want;
run;
Some example data would be helpful. You may end up going back to re-reading your raw data.
SAS has more than one "missing" actually a single variable can have up to 27 different missing values.
Here's a brief example of recoding values to the missing for numeric values.
data example; input q1 q2 q3; /* assume questions q1 through 3 are coded the same and a recorded respose of 77 means "Don't know" and 99 is "Refused"*/ array q q1-q3; do i=1 to dim(q); if q[i]= 77 then q[i]= .D; if q[i]= 99 then q[i]= .R; end; datalines; 1 2 77 2 3 1 99 4 3 1 . 2 ; run; proc freq data=example; tables q1 q2 q3 ; run; proc format library=work; value myq .D= "Don't Know" .R= "Refused" . = "No Answer" ; run; proc freq data=example; tables q1 q2 q3/missing ; format q: myq.; run; proc tabulate data=example; class q1 q2 q3/missing; format q1 q2 q3 myq.; tables q1 q2 q3, n pctn; run;
If you need to do the exact same thing to all of the numeric variables you can use: Array arrayname _numeric_; which places all of the numeric variables into the array.
Same with character variables:
Array charvars _character_;
For anything else you should provide some concrete input data and what the result from that data is expected.
(a) are your variables character (possibly with a value of "M") or numeric (possibly with a value of .M)?
(b) when computing percentages, do you want all non-M values (including missing values) included in the denominator?
The programming isn't that difficult, but I need the answers first.
" both numeric and character. Each variable can take two missing values, M or N."
Show us the sample data and the output you are looking for .
data have;
input a $ b;
cards;
M 12
c .M
x .N
n .M
;
run;
data _null_;
set sashelp.vcolumn(keep=libname memname name type
where=(upcase(libname)='WORK' and upcase(memname)='HAVE')) end=last;
if _n_=1 then call execute('proc sql;create table temp as select ');
if type='char' then call execute(cat('sum(',name,'="M")/count(*) as ',name));
else call execute(cat('sum(',name,'=.M)/count(*) as ',name));
if last then call execute('from have;quit;');
else call execute(',');
run;
proc transpose data=temp out=want;
run;
Thanks to everyone who replied. I am aware that SAS variables can take multiple missing values, and that there is a difference in the way they are treated for numeric an character variables. In my case, I have both character and numeric variables. For character variables, missing values are indcated by 'M' or 'N'; for numeric, .M or .N. I want to list the percentage of all cases that have the 'M' or .M value, as applicable, for every variable, of which there are a great many.
For a totally made-up example, see the pictures below:
So did you try my code ?
Yes, I did. It does work. Thank you for your help. My pride is assuaged by the fact that it was not as simple and obvious as I had feared.
--Dav
if you want simpler, IML code might do it .
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.