BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
saatkinson
Calcite | Level 5

Hi!

 

I'm new to SAS, but I need to run a lot of two-way ANOVAs and store the statistics information in a format that I can easily access later. I have two categorical independent variables, Gender and Treatment, and for this sample I have ~470 different continuous dependent variables, where each variable is a gene ID (for example A0A011V3B7) called Y_1-Y_470 here for ease of typing. I can successfully output each individual result with:

 

 

 

proc anova data=data_red;

class Gender Treatment

model Y_1 = Gender Treatment Gender*Treatment /*repeat with Y_2, Y_3, etc.*/

run;

But I don't relish doing this 470 times, and was hoping for help in automating the process to make SAS do it for me.  I found a couple posts similar to what I want to do and have tried to combine/modify them to suit my needs.

 

Here is what I've attempted:

 

proc sql noprint;
	select distinct NAME into : namelist separated by ' ' from anova_vars; /*retrieve variable names into a macro variable*/
quit;

%macro runAnvoa(DSName, nameList);
%local i next_name;
%let i=1;
%do %while (%scan(&namelist, &i) ne);
	%let next_name = %scan(&namelist, &i);
	proc anova data=data_red outstat=PE(rename=(&namelist=Value)); /*trying to name each model with the gene name*/
		class Gender Treatment;
		model next_name = Gender Treatment Gender*Treatment;
		quit;
	proc append base=outStats data=PE;
	run;
%end;


%runanvoa(data_red, &namelist);

/*if above works look at output statistics:*/ proc print outStats; run;

 

 

This gives me an error about quoted strings, but I'm not using quotes at all. I've included the log file from my run.  Any help to get this working (or a more efficient way to do it) would be greatly appreciated!

 

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

Don't use macros. Use the BY statement.

 

Convert your 470 Y variables into a single Y variable, with an indicator variable running 1 to 470. Then run PROC GLM with the BY statement. You can do this with ARRAY statement in a data step.

 

Don't use PROC ANOVA, use PROC GLM.

--
Paige Miller

View solution in original post

7 REPLIES 7
PaigeMiller
Diamond | Level 26

Don't use macros. Use the BY statement.

 

Convert your 470 Y variables into a single Y variable, with an indicator variable running 1 to 470. Then run PROC GLM with the BY statement. You can do this with ARRAY statement in a data step.

 

Don't use PROC ANOVA, use PROC GLM.

--
Paige Miller
saatkinson
Calcite | Level 5

I'm confused on how to convert my Y variables into a single Y variable without transposing the data. If I transpose the table I will lose the association of my Gender and Treatment variables, yes? Could you explain how to use the array statement to do the conversion you mention?

 

Attached is a small snippet of my table.
  

Thanks so much for your help!

 

PaigeMiller
Diamond | Level 26

It is not a "transpose" in the sense that you don't use PROC TRANSPOSE. You use an ARRAY statement.

 

data new;
    set old;
    array y y1-y470;
    do index=1 to dim(y);
        new_y=y(index);
        output;
    end;
    drop y1-y470;
run;
--
Paige Miller
saatkinson
Calcite | Level 5
Got it to work perfectly! Thanks for your help!
art297
Opal | Level 21

Glad to hear that you were able to solve your problem but, for future reference, the method you used took twice as long to run as it had to, and you ended up with output that didn't include either the variable names or labels.

 

When you ran either GLM or ANOVA you aren't limited to one dependent variable. You could have included all of them using the same variable list that you used with @PaigeMiller's suggested array statement.

 

Art, CEO, AnalystFinder.com

 

art297
Opal | Level 21

You can include all of your dependent variables in one model statement. If you elect to go that way, and want the analyses to be conducted as a multivariate manova, you can always include the manova option.

 

Art, CEO, AnalystFinder.com

PaigeMiller
Diamond | Level 26

@art297 wrote:

You can include all of your dependent variables in one model statement. If you elect to go that way, and want the analyses to be conducted as a multivariate manova, you can always include the manova option.

 

Art, CEO, AnalystFinder.com


The problem with MANOVA here is that any missing value among the Y variables causes the observation not to be used. On the other hand, if no MANOVA statement is used, then this is a very good approach.

--
Paige Miller

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1050 views
  • 0 likes
  • 3 in conversation