So my question is very similar to this one https://communities.sas.com/t5/Base-SAS-Programming/Proc-Means-loop-Macro/m-p/177460/highlight/true#...
However, I want to try to add in any interaction effects. I'm trying to add flexibility into my code. Originally when I wrote this, my code would only allow me 2 Factors with 2 sublevels each. I will try to keep it simple because once I get the main idea I can probably apply it to what else I want to do in my code. Essentially I want it so my code can read in any number of "Factors" and "levels" of the said Factors: Example: Factor1(has categorical responses, such as Yes, No), Factor2(same thing, cat responses), I don't want to limit the responses to only 2 though. Let's focus on the proc means part first, I may ask a seperate question for what I want to do next. So essentially I have this....
%MACRO meansANOVA (data, totalvars, Factor1, Factor2, S1F1, S1F2, S2F1, S2F2)
%DO i = 1 %TO &totalvars;
proc means data = &data clm mean std stderr;
class &Factor1 &Factor2;
var m&i;
ods output summary= &data.Means&i;
run;
proc means data = &data clm mean std stderr;
class &Factor1;
var m&i;
ods output summary= &data.&Factor1.Means&i;
run;
proc means data = &data clm mean std stderr;
class &Factor2;
var m&i;
ods output summary= &data.&Factor2.Means&i;
run;
What I want to do is to somehow do this automatically. Some way to create a means data(I need them seperately) of each individual factor + their possible interaction effects. So if I had 3 factors for example it would run proc means on factor 1, factor 2, and factor 3, and then run it on factor1*factor2, factor1*factor3, factor2*factor3 and factor1*factor2*factor3 and generate seperate outputs. Does anyone have an idea of how I can do this? I was thinking there would be use of the scan function.
Originally when I ran this analysis I did all the merging/seperating/etc..(data manipulation) through JMP, I was trying to write a Macro that may automate everything I did.
TLDR: My major goal is to just "automate" tons of data manipulation through SAS probably through a macro so that all I would need to put in is a few things such as dataset name, total # of variables we're interested in, and possible factors/levels of said factors each time.
Like I said, currently I have my code written out so that it would take in 2 Factors with 2 sublevels. so Factor 1, Yes/No, and Factor 2, Yes/No for example. My code IS working but it's very messy and has tons of lines of code ~500+ lines of code. I want to give it more flexibility so that my input data does not need to be 2 Factors with 2 sublevels and perhaps could do 2 Factors with 3 sublevels or 3 Factors with 3 sublevels, etc....
Edit: "Inputdata.xlsx" is a small sample of what my input data looks like, I deleted all the variables that I wasn't interested in and also changed/edited some of the names. It has 2 factors to make it "Easier." "Example.xlsx" shows what my final "means" table should look like for a different project that had 3 factors instead of 2. "Example1.xlsx" shows the means table before I start separating it and merging with other tables for my Inputdata.
_type_ is actually rather easy. 0 will be the overall value, then all of the 1 factor summaries will have values 1,2,3 ... The two variable combinations are next followed by the 3 variable (etc). The specific order of combinations follows a pattern that is pretty easy.
I have one recurring project where I have 6 class variables and report on 20 odd specific combinations of those.
Just to get the ball rolling ...
What if you have 5 factors? Do you then want all combinations of 3 and all combinations of 4?
Almost certainly you will have just one parameter for the factors:
factor_list = factor1 factor2, ...
Macro language can work with that pretty easily, such as:
%do i=1 %to %sysfunc(countw(&factor_list));
Essentially yes... but with the data I work with, the majority of the time it will only be 2~3 factors. I will attach an excel file of what my "final" results should look like, I modified/deleted a lot of it for "data privacy" but left enough to show an example.
To reiterate, basically what I did originally was Split/Transpose/Merge/Remerge data sets using JMP(also created by SAS corporation), I also generated P values and means/stderr/stdmeans, etc.. through JMP rather than SAS. I had to use SAS in order to run a comparison table (lsmeans in proc mixed). Regardless of that, I was wondering how to code all the data manipulation that I did into SAS so that for future projects I could just run through a macro and it would do everything for me rather than having me reclick/redo everything in JMP.
You may not need this macro at all unless you really like creating additional data sets that I suspect you then put together later.
Something like this skeleton:
proc summary data=youdataset; class yourclassvariablelist; var yourvariablelist; output out=outputset lclm= uclm= mean= std= stderr= /autoname; run;
Does all the summaries at one time creating appropriately named variables. The Values of the output _type_ variable will let you determine which combination of the class variables is involved for any use.
Also I believe you are going to get errors because CLM is not a valid statistic for Means, you want UCLM for the upper bound and LCLM for the lower.
An example with a SAS supplied data set:
proc summary data=sashelp.class; class age sex; var height weight; output out=summary lclm= uclm= mean= std= stderr= /autoname; run;
I strongly agree with @ballardw. There's no need for a macro - except maybe to separate the datasets if you really need them separate.
Yeah I saw his solution and posted accordingly, the problem is I want my "final" table to be formatted like in my "example.xlsx" file, which I attached above in my original post just now. Is there even a "Good" way to pick out stuff without using IF statements?
Your solution does solve one of my issues, it fixes the need to loop and automatically creates each combination. The only problem is I need the output to be of a certain format and it may be harder to manipulate it with these results. I'm not sure how to do it in an "automated" fashion, currently I have just written a bunch of IF statements, such as...
data &S1F1.&S2F1.means; set &data.means; if &Factor1 = "&S1F1" and &Factor2 = "&S2F1"; run;
The problem with this is, once again it doesn't take into account how many factors I have. What if I had 3 factors instead of 2? This code wouldn't work for me.
"additional data sets that I suspect you then put together later"
Well, yes I need to extract these means, STD, N, and Confidence Limits because I need to combine(merge) them with 2 other tables. I need to keep the 3 tables separate, basically one table for Factor1, one for Factor2, and one for Factor1+Factor2. (The numbers for Factor1, Factor2, and Factor1*Factor2 effects).
"Also I believe you are going to get errors because CLM is not a valid statistic for Means, you want UCLM for the upper bound and LCLM for the lower."
??? CLM in my proc means did not throw any error, it created UCLM and LCLM correctly.
Attached:
This is my current output with my proc means after some manipulation if i used
class Factor1 Factor2
. I would further manipulate this so that each Mean/stderr/Limits would have what they reference to at the top. Like for Factor1, (it disregards factor2) the files would have... Means for Factor1 with each sublevel of S1F1 and S1F2. For Factor2 it would have means for Factor2 with each sublevel of S2F1 and S2F2. For both Factor1 Factor2(takes into account both), it'd have 4 different means, means for S1F1, S1F2, S2F1, S2F2. Refer to "Example.xlsx" to see what my final table should be like.
Leon,
A couple of notes, just to get you back on what is likely the right path.
First, your CLASS statement would be expanded to keep your factors and subfactors. You might be generating statistics for combinations that you don't care about, but you would have all the combinations possible.
Second, take a look at the automatic variable _TYPE_ that appears in your output data set. Each _TYPE_ value corresponds to a particular combination of CLASS variables. You may need to study it a bit to figure out how to use _TYPE_ to subset the output, but that's the key to finding each level of summarization.
Thanks for some ideas guys, I am going to see if I can do something using proc summary and _TYPE_ that @Astounding and @ballardw mentioned. I was struggling to think about how I could reference the _TYPE_ correctly in order to subset the output, but I have an idea now that I'm going to try out.
_type_ is actually rather easy. 0 will be the overall value, then all of the 1 factor summaries will have values 1,2,3 ... The two variable combinations are next followed by the 3 variable (etc). The specific order of combinations follows a pattern that is pretty easy.
I have one recurring project where I have 6 class variables and report on 20 odd specific combinations of those.
I managed to get a main part done so I'm going to close this thread, I may ask another question later if I can't figure it out, but from now on it should be similar to what I did originally just a little few edits. It's MUCH more clean than my previous code when I had dealt with 3 tables separately.
So yeah this basically gives me what I want, but now I have to further merge these with some other tables.
%MACRO meansANOVA (data, totalvars, Varlist);
%DO i = 1 %TO &totalvars;
proc summary data=test;
class &Varlist;
var m&i;
output out=summary&i lclm= uclm= mean= std= stderr= /autoname;
run;
data summary&i;
set summary&i;
if _TYPE_ ne 0;
run;
data copy&i;
set summary&i;
run;
proc sort data = copy&i nodupkey;
by _Type_;
run;
data summary&i._1;
set copy&i nobs=total;
count+1;
totalnum = total;
run;
data _null_;
set summary&i._1;
if count ne 1 then delete;
call symput('factors', totalnum);
run;
%do j = 1 %to &factors;
data Factor&j.m&i;
set summary&i;
if _Type_ = &j;
run;
data Factor&j.m&i;
set Factor&j.m&i;
rename m&i._LCLM = LowerCL m&i._UCLM = UpperCL m&i._Mean = Mean m&i._StdDev = StdDev m&i._StdErr = StdErr _FREQ_ = N;
Label = "m&i";
run;
proc sort data = Factor&j.m&i;
by Label;
run;
data MergedFactor&j;
length Label $100;
merge Factor&j.m1-Factor&j.m&i;
by Label;
run;
%end;
%end;
%mend meansanova;
Thanks for the ideas guys.
You can use the Ways statement to help limit the output as well.
Many users here don't want to download Excel files because of virus potential, others have such things blocked by security software. So I don't read xlxs files from sources unapproved by my IT policy.
Once you identify the combinations of the class variables and the associated _type_ the first step would involve a WHERE data set option or statement to subset the data: where _type_ in (1,2,3) for example.
Proc means will do things similarly to Proc Summary but I believe the structure of the output data set varies somewhat and you may want to investigate the options Classdata= , Completetypes, Exclusive and Missing and there impact on _type_.
I'm not sure I understand exactly what you want to develop, but I think that next code
can demostray a tool to use and build your macro:
macro combine(list1, list2);
%let f = %sysfunc(countw(&list1));
%let v = %sysfunc(countw(&list2));
%do i=1 %to &f;
%do j=1 %to &v;
%let comb = %scan(&list1,&i) - %scan(&list2,&j);
%put COMB = &comb;
%end;%end;
%mend combine;
%combine(fac1 fac2 fac3, vara varb varc vard);
run this code and see the results in log.
I have the feeling that - %sysfunc(countw(&list1)); - is the tool you need.
Out from all posts upto now, you are looking for a mechnaizm to run PROC MEANS several times on same input,
each time with different combination of factors - which you said the maximum is 3 factors.
The skilton of proc means, as you used it, is:
proc means data = &data clm mean std stderr;
class <factors combination>;
var m&i; /* each time run on a different variable */
ods output summary= &data.Means&i;
run;
It is easy to define the factors combination manually:
- in case of 3 factors: f1, f2, f3, f1 f2, f1 f3, f1 f2 f3
- in case of 2 factors: f1, f2, f1 f2
So your main work is :
1) have a loop of macro variable i to select each time another variable. That you have already done in your macro program.
2) calculate the class to use: either a single factor or a combination of two or of all the 3 factors.
That means you need a loop of selecting factor / factors combination within the loop of selecting a variable.
In each iteration define the output dataset name as combination of: &data + &class (the factors used) + &i (the variable used) .
then code proc means once only as:
%do;
proc means data = &data clm mean std stderr;
class &class;
var m&i;
ods output summary= &data.&class1._Means&i;
run;
%end;
&class - is the factor or factors seperated by space.
&class1 - is the factor or factors seperated by underscore '_'.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.