- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi!
I'm new to SAS, but I need to run a lot of two-way ANOVAs and store the statistics information in a format that I can easily access later. I have two categorical independent variables, Gender and Treatment, and for this sample I have ~470 different continuous dependent variables, where each variable is a gene ID (for example A0A011V3B7) called Y_1-Y_470 here for ease of typing. I can successfully output each individual result with:
proc anova data=data_red; class Gender Treatment model Y_1 = Gender Treatment Gender*Treatment /*repeat with Y_2, Y_3, etc.*/ run;
But I don't relish doing this 470 times, and was hoping for help in automating the process to make SAS do it for me. I found a couple posts similar to what I want to do and have tried to combine/modify them to suit my needs.
Here is what I've attempted:
proc sql noprint; select distinct NAME into : namelist separated by ' ' from anova_vars; /*retrieve variable names into a macro variable*/ quit; %macro runAnvoa(DSName, nameList); %local i next_name; %let i=1; %do %while (%scan(&namelist, &i) ne); %let next_name = %scan(&namelist, &i); proc anova data=data_red outstat=PE(rename=(&namelist=Value)); /*trying to name each model with the gene name*/ class Gender Treatment; model next_name = Gender Treatment Gender*Treatment; quit; proc append base=outStats data=PE; run; %end; %runanvoa(data_red, &namelist);
/*if above works look at output statistics:*/ proc print outStats; run;
This gives me an error about quoted strings, but I'm not using quotes at all. I've included the log file from my run. Any help to get this working (or a more efficient way to do it) would be greatly appreciated!
Thanks!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Don't use macros. Use the BY statement.
Convert your 470 Y variables into a single Y variable, with an indicator variable running 1 to 470. Then run PROC GLM with the BY statement. You can do this with ARRAY statement in a data step.
Don't use PROC ANOVA, use PROC GLM.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Don't use macros. Use the BY statement.
Convert your 470 Y variables into a single Y variable, with an indicator variable running 1 to 470. Then run PROC GLM with the BY statement. You can do this with ARRAY statement in a data step.
Don't use PROC ANOVA, use PROC GLM.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I'm confused on how to convert my Y variables into a single Y variable without transposing the data. If I transpose the table I will lose the association of my Gender and Treatment variables, yes? Could you explain how to use the array statement to do the conversion you mention?
Attached is a small snippet of my table.
Thanks so much for your help!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It is not a "transpose" in the sense that you don't use PROC TRANSPOSE. You use an ARRAY statement.
data new;
set old;
array y y1-y470;
do index=1 to dim(y);
new_y=y(index);
output;
end;
drop y1-y470;
run;
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Glad to hear that you were able to solve your problem but, for future reference, the method you used took twice as long to run as it had to, and you ended up with output that didn't include either the variable names or labels.
When you ran either GLM or ANOVA you aren't limited to one dependent variable. You could have included all of them using the same variable list that you used with @PaigeMiller's suggested array statement.
Art, CEO, AnalystFinder.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You can include all of your dependent variables in one model statement. If you elect to go that way, and want the analyses to be conducted as a multivariate manova, you can always include the manova option.
Art, CEO, AnalystFinder.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@art297 wrote:
You can include all of your dependent variables in one model statement. If you elect to go that way, and want the analyses to be conducted as a multivariate manova, you can always include the manova option.
Art, CEO, AnalystFinder.com
The problem with MANOVA here is that any missing value among the Y variables causes the observation not to be used. On the other hand, if no MANOVA statement is used, then this is a very good approach.
Paige Miller