I'm trying to create a table that summarizes variables (e.g. by their mean and standard deviation) across levels of multiple class variables. Here's an example of what I want, in which the variables x1 and x2 are summarized across levels of two class variables (which you can guess are sex and age):
Note that age is nested within sex, and the "Female" and "Male" headers span the levels of age. I'm trying to write a macro that automatically generates the COLUMN statement in PROC REPORT to create this style of header, for any number of class variables. E.g., I would be able to nest another class variable, height (short vs tall) within levels of age.
I currently have code that creates a table similar to what I want, but the headers don't span. Like this:
My code is below. Currently, it (1) creates some fake data to play with, (2) summarizes the data with PROC MEANS, (3) transposes the data so it's in the desired form, and (4) prints the data with PROC REPORT. If you run the code as is, it should reproduce the above Table 2. You can set the nClasses macro variable to a different value to create a table with 1, 2, 3, or 4 class variables.
* simulate data. The c vars are the class variables, and the x vars are
the data to be summarized within levels of the classes.;
data test;
call streaminit(12345);
do id = 1 to 200;
if rand('Bernoulli', 0.6) then c1 = 'Female'; else c1 = 'Male';
if rand('Bernoulli', 0.3) then c2 = 'Young'; else c2 = 'Old';
if rand('Bernoulli', 0.5) then c3 = 'Short'; else c3 = 'Tall';
if rand('Bernoulli', 0.8) then c4 = 'Alive'; else c4 = 'Dead';
x1 = rand('Normal', 60, 15);
x2 = rand('Normal', 3, 1);
output;
end;
run;
%let nClasses = 2; * how many class variables should we use? Set to 1, 2, 3, or 4;
* summarize the x variables by class;
proc means data=test mean std stackodsoutput;
class c1-c&nClasses;
var x1 x2;
ods output summary=means;
run;
data means;
set means;
class = catx(' ', of c1-c&nClasses); * concatenate all the class variables, space-delimited;
stat = catx(' ', put(mean,6.1), cats('(', put(stdDev,6.1), ')')); * combine mean and sd into a char var;
run;
* create two macro vars:
(1) nClassLevels = the total number of class levels. By level, we mean a combination of
all the class vars of interest. So if nClasses=2, then there are 4 levels, since c1
and c2 both have two levels. In general for our simulated data, where all the class
vars are binary, nClassLevels equals 2^nClasses.
(2) classLevels = the names of all the levels. E.g. if nClasses=2, the first class level
is Female Old and the last is Male Young. The levels are delimited by a | (and within
a level, the sublevels are delimited by a space.;
proc sql noprint;
select count(class), class
into :nClassLevels, :classLevels separated by '|'
from means
where variable = 'x1';
quit;
%put nClassLevels = &nClassLevels;
%put classLevels = &classLevels;
* transpose data from long to wide;
proc sort data=means;
by variable;
run;
proc transpose data=means out=table prefix=stat;
by variable;
var stat;
run;
* print the table. This version does not cause headers to span across nested subheaders.;
%macro print;
proc report data=table nowd;
column ("Variable" variable)
%do i = 1 %to &nClassLevels;
%let classLevelsI = %scan(&classLevels,&i,|);
%do j = 1 %to &nClasses;
("%scan(&classLevelsI,&j,,s)"
%end;
stat&i
%do j = 1 %to &nClasses;
)
%end;
%end;
;
define variable / "";
%do i = 1 %to &nClassLevels;
define stat&i / "" center;
%end;
run; quit;
%mend print;
%print;
My question is, how do I alter the %print macro so that the header labels span nested labels, as in Table 1? This should work no matter the value of nClasses.
For reference, if you run the above code (with nClasses = 2), then the code below will produce Table 1. (I wrote this code by hand, but I want the %print macro to do it for me.)
proc report data=table nowd;
column ("Variable" variable)
("Female"
("Old" stat1)
("Young" stat2)
)
("Male"
("Old" stat3)
("Young" stat4)
);
define variable / "";
define stat1 / "" center;
define stat2 / "" center;
define stat3 / "" center;
define stat4 / "" center;
run; quit;
I'm using SAS 9.4.
Thanks for any suggestions!
Try using the N statistic.
proc report data=means nowd;
column ('Variable' variable) c1, c2, stat n;
define variable / group '';
define c1 / across '';
define c2 / across '';
define stat / '';
define n / noprint;
run; quit;
Sailing Over the ACROSS Hurdle in PROC REPORT by Cynthia Zender explains this method in more detail, starting on page 4.
What happens if you change your data structure and use those variables as ACROSS variables?
Just wondering if that would automatically format the way you want, without having to delve into Macros.
Note that I'm going to move this post to the ODS reporting forum, since it's more related to reporting than macro's per se.
Great suggestion! Using the means dataset that I create in my code above, and using c1 and c2 as across variables, I just wrote the following code:
proc report data=means nowd;
column ('Variable' variable) c1, c2, stat;
define variable / '';
define c1 / across '';
define c2 / across '';
define stat / '';
run; quit;
This gets me tantalizingly close to what I want, but for some reason the values of stat get spread out over 8 rows instead of 2:
Is there some option to collapse the values of stat? I tried making variable a group variable, but got an error message saying "There is no statistic associated with stat". Any ideas?
For reference, here's what the means dataset looks like (showing just the relevant variables):
Try using the N statistic.
proc report data=means nowd;
column ('Variable' variable) c1, c2, stat n;
define variable / group '';
define c1 / across '';
define c2 / across '';
define stat / '';
define n / noprint;
run; quit;
Sailing Over the ACROSS Hurdle in PROC REPORT by Cynthia Zender explains this method in more detail, starting on page 4.
Is there a way to create the output report by working directly on the raw data ?
Hi:
You'd have to read the raw data into a SAS dataset in order to use PROC REPORT. Or, you'd have to read the RAW data in a DATA step program and then use PUT statements to create the output table. Either way, SAS is not dealing with the raw data directly.
As an example, here's a DATA step program that reads raw data that has been constructed as described earlier in this post and then the same PROC REPORT code can be used on the file. I just used a DATALINES section to feed "instream" data to the program, but your INFILE statement could have been:
infile 'c:\temp\rawdata.txt' dlm=',' dsd;
...instead of having a DATALINES for the data.
Cynthia
data means;
length c1 $10 c2 $5 variable $32 stat $30;
infile datalines dlm=',' dsd;
input c1 $ c2 $ Variable $ stat $ ;
return;
datalines;
Female,Old,x1,"60.4 (13.4)"
Female,Young,x1,"64.1 (13.0)"
Male,Old,x1,"59.1 (14.2)"
Male,Young,x1,"52.1 (16.1)"
Female,Old,x2,"3.0 (1.0)"
Female,Young,x2,"2.9 (0.8)"
Male,Old,x2,"3.0 (1.1)"
Male,Young,x2,"2.6 (0.8)"
;
run;
proc report data=means nowd;
column ('Variable' variable) c1, c2, stat n;
define variable / group '';
define c1 / across '';
define c2 / across '';
define stat / '';
define n / noprint;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.