I have a large survey data set with hundreds of variables, both numeric and character. Each variable can take two missing values, M or N. I want to make a list that shows the percent of M values for each variable. I can define formats to collapse all the variables into M and not-M, but I'm stuck at producing the list. With this many variables, hand-coding is out of the question. I thought to make an output dataset with PROC FREQ, but that allows only one frequency distribution per output dataset. The regular output from PROC FREQ contains all the headers, which I don't want to have to edit out. I've also tried PROC TABULATE, but I can't work out how to get it to do what I want. I could set up a macro to loop through each variable, run PROC FREQ, and concatenate the output datasets. However, I feel like there must be a smack-in-the-head-of-course method that I'm missing. What is it?
I want a list like this:
Variable Name Percent M.
--Dav
" both numeric and character. Each variable can take two missing values, M or N."
Show us the sample data and the output you are looking for .
data have;
input a $ b;
cards;
M 12
c .M
x .N
n .M
;
run;
data _null_;
set sashelp.vcolumn(keep=libname memname name type
where=(upcase(libname)='WORK' and upcase(memname)='HAVE')) end=last;
if _n_=1 then call execute('proc sql;create table temp as select ');
if type='char' then call execute(cat('sum(',name,'="M")/count(*) as ',name));
else call execute(cat('sum(',name,'=.M)/count(*) as ',name));
if last then call execute('from have;quit;');
else call execute(',');
run;
proc transpose data=temp out=want;
run;
Some example data would be helpful. You may end up going back to re-reading your raw data.
SAS has more than one "missing" actually a single variable can have up to 27 different missing values.
Here's a brief example of recoding values to the missing for numeric values.
data example;
input q1 q2 q3;
/* assume questions q1 through 3 are coded the same
and a recorded respose of 77 means "Don't know"
and 99 is "Refused"*/
array q q1-q3;
do i=1 to dim(q);
if q[i]= 77 then q[i]= .D;
if q[i]= 99 then q[i]= .R;
end;
datalines;
1 2 77
2 3 1
99 4 3
1 . 2
;
run;
proc freq data=example;
tables q1 q2 q3 ;
run;
proc format library=work;
value myq
.D= "Don't Know"
.R= "Refused"
. = "No Answer"
;
run;
proc freq data=example;
tables q1 q2 q3/missing ;
format q: myq.;
run;
proc tabulate data=example;
class q1 q2 q3/missing;
format q1 q2 q3 myq.;
tables q1 q2 q3, n pctn;
run;
If you need to do the exact same thing to all of the numeric variables you can use: Array arrayname _numeric_; which places all of the numeric variables into the array.
Same with character variables:
Array charvars _character_;
For anything else you should provide some concrete input data and what the result from that data is expected.
(a) are your variables character (possibly with a value of "M") or numeric (possibly with a value of .M)?
(b) when computing percentages, do you want all non-M values (including missing values) included in the denominator?
The programming isn't that difficult, but I need the answers first.
" both numeric and character. Each variable can take two missing values, M or N."
Show us the sample data and the output you are looking for .
data have;
input a $ b;
cards;
M 12
c .M
x .N
n .M
;
run;
data _null_;
set sashelp.vcolumn(keep=libname memname name type
where=(upcase(libname)='WORK' and upcase(memname)='HAVE')) end=last;
if _n_=1 then call execute('proc sql;create table temp as select ');
if type='char' then call execute(cat('sum(',name,'="M")/count(*) as ',name));
else call execute(cat('sum(',name,'=.M)/count(*) as ',name));
if last then call execute('from have;quit;');
else call execute(',');
run;
proc transpose data=temp out=want;
run;
Thanks to everyone who replied. I am aware that SAS variables can take multiple missing values, and that there is a difference in the way they are treated for numeric an character variables. In my case, I have both character and numeric variables. For character variables, missing values are indcated by 'M' or 'N'; for numeric, .M or .N. I want to list the percentage of all cases that have the 'M' or .M value, as applicable, for every variable, of which there are a great many.
For a totally made-up example, see the pictures below:
So did you try my code ?
Yes, I did. It does work. Thank you for your help. My pride is assuaged by the fact that it was not as simple and obvious as I had feared.
--Dav
if you want simpler, IML code might do it .
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.