BookmarkSubscribeRSS Feed
madpumpkinpie
Calcite | Level 5

Hi there,

I have googled similar questions but could not find answers that I can understand. So here I am asking for your help!

What I want to know is how to get results separately in a group.

 

Example (just copied and pasted from Excel)

 

ID  Weight Treatment kcal
1    NW    A    400
2    NW    A    500
3    OW    A    560
4    NW    A    800
5    OW    A    490
6    NW    A    500
7    OW    A    400
8    OW    A    700
9    NW    A    900
1    NW    B    580
2    NW    B    600
3    OW    B    800
4    NW    B    500
5    OW    B    600
6    NW    B    800
7    OW    B    700
8    OW    B    500
9    NW    B    780
1    NW    C    570
2    NW    C    670
3    OW    C    570
4    NW    C    400
5    OW    C    600
6    NW    C    800
7    OW    C    800
8    OW    C    500
9    NW    C    800

 

In this example, I am going to run one-way repeated measures ANOVA to see if there is a treatment effect (A, B, C) on subsequent calorie intake (kcal). However, I want to see the result separately for each body weight status (NW=normal weight & OW=overweight). Please ignore the small sample size because this is just an example. 

 

In SPSS, we can "split file" and then get results for both NW and OW separately in any analysis conducted after that.

 

I am an absolute beginner of SAS and have never edited the code. Just importing excel files for each analysis and selecting some commands. Therefore, I'd appreciate if you could explain in a comprehensive way if it involves code!

 

TIA

 

12 REPLIES 12
PaigeMiller
Diamond | Level 26

First, there's no need to split anything here ... no need to split the dataset ... no need to split the analysis; in fact, splitting the analysis would be the wrong thing to do from a statistical point of view.

 

proc glm;
    class weight treatment;
    model kcal=weight|treatment;
run;
quit;

Optionally you may want an LSMEANS statement.

--
Paige Miller
madpumpkinpie
Calcite | Level 5

Thank you for replying!

I understand what you mean, but then how can I compare the effect of treatment on food intake between NW and OW participants? It's because I found significant treatment effect in NW group (n=41) but not in OW group (n=12) when I ran One-way repeated measures ANOVA for each body weight group.

 

And what is this code for? Where to be put?

proc glm;
    class weight treatment;
    model kcal=weight|treatment;
run;
quit; 

 Thank you,

novinosrin
Tourmaline | Level 20

@PaigeMiller  a class act. 

 

@madpumpkinpie just in case, if you need for other purposes in future

data have;
input ID $  Weight $ Treatment $ kcal;
datalines;
1    NW    A    400
2    NW    A    500
3    OW    A    560
4    NW    A    800
5    OW    A    490
6    NW    A    500
7    OW    A    400
8    OW    A    700
9    NW    A    900
1    NW    B    580
2    NW    B    600
3    OW    B    800
4    NW    B    500
5    OW    B    600
6    NW    B    800
7    OW    B    700
8    OW    B    500
9    NW    B    780
1    NW    C    570
2    NW    C    670
3    OW    C    570
4    NW    C    400
5    OW    C    600
6    NW    C    800
7    OW    C    800
8    OW    C    500
9    NW    C    800
;
proc sort date=have;
by weight;
run;
data _null_;
if _n_=1 then do;
if 0 then set have;
 dcl hash h(dataset:'have(obs=0)',multidata:'y');
 h.definekey('Weight');
 h.definedata(all:'y');
 h.definedone();
 end;
 set have;
 by weight;
 if first.weight then h.clear();
 h.add();
 if last.weight then h.output(dataset: weight);
 run;
madpumpkinpie
Calcite | Level 5

Thank you for replying!

 

Does this code include both the command for splitting file and running ANOVA?

novinosrin
Tourmaline | Level 20

Nope Anova, funny enough, i have just enrolled for statistics courses as a full time student 🙂  The code just splits. Sorry about that

PaigeMiller
Diamond | Level 26

@madpumpkinpie wrote:

Thank you for replying!

 

Does this code include both the command for splitting file and running ANOVA?


No, as I said, there is no need to do any splitting when you do an ANOVA.

--
Paige Miller
madpumpkinpie
Calcite | Level 5

Hi there, 

I'm sorry, but I don't get why no need of split...:(  Could you explain?

Then if I want to see the separate result of the effect of treatment on calorie intake for different weight status, may I conduct one-way ANOVA twice (one for NW group and the other one for OW group)?

 

Thanks,

PaigeMiller
Diamond | Level 26

You conduct one ANOVA, where the effect of treatment and the effect of weight are both in the model. Splitting the data and conducting an ANOVA for OW and another ANOVA for NW is the wrong thing to do, statistically.


From this one analysis, you can determine the effect of treatment for the OW case, and a different effect of treatment for the NW case.

--
Paige Miller
art297
Opal | Level 21

@madpumpkinpie: Just wanted to point out a couple of things. First, I agree with @PaigeMiller, if you're doing the analysis to test hypotheses, the correct method is to use a single analysis.

 

However, so that you gain a better understanding of how SAS works, you could still get exactly what you asked for without having to "split" the data.

 

e.g., @PaigeMiller recommended:

proc glm data=have;
    class weight treatment;
    model kcal=weight|treatment;
run;

To do the same thing, separately for each level of treatment, one could use:

proc sort data=have;
  by treatment;
run;

proc glm data=have;
    class weight;
    by treatment;
    model kcal=weight;
run;

@novinosrin: While splitting the data isn't needed to solve the problem, this question was cross-posted on SAS-L @SAShole (i.e., Paul Dorfman) pointed out that no sort is needed when using the hash object to split such a file. I ran a test comparing your method with the one that Paul suggested, and I totally have to agree with him. He suggested:

data _null_ ; 
  dcl hash x() ; 
  x.defineKey ('weight') ; 
  x.defineData ('weight','h') ; 
  x.defineDone () ; 
  dcl hash h ; 
  do until (z) ; 
    set have end = z ; 
    if x.find() ne 0 then do ; 
      h = _new_ hash (dataset:'have(obs=0)', multidata:'y') ;
      h.defineKey ('weight') ; 
      h.defineData (all:'y') ; 
      h.defineDone () ; 
      x.add() ; 
    end ; 
    h.add() ; 
  end ; 
  dcl hiter i('x') ; 
  do while (i.next()=0) ; 
    h.output (dataset: weight) ; 
  end ; 
  stop ; 
run ; 

Your solution would only be (very) slightly faster IF the data were already sorted.

 

Art, CEO, AnalystFinder.com

 

 

 

 

novinosrin
Tourmaline | Level 20

Thank you @art297 for even filling me in such amazing discussions. It's really a privilege although makes me very nervous to participate when you champs and your contemporaries do. 

Anyway, i have tried another 9.4 method avoiding sort:

 

data have;
input ID $  Weight $ Treatment $ kcal;
datalines;
1    NW    A    400
2    NW    A    500
3    OW    A    560
4    NW    A    800
5    OW    A    490
6    NW    A    500
7    OW    A    400
8    OW    A    700
9    NW    A    900
1    NW    B    580
2    NW    B    600
3    OW    B    800
4    NW    B    500
5    OW    B    600
6    NW    B    800
7    OW    B    700
8    OW    B    500
9    NW    B    780
1    NW    C    570
2    NW    C    670
3    OW    C    570
4    NW    C    400
5    OW    C    600
6    NW    C    800
7    OW    C    800
8    OW    C    500
9    NW    C    800
;


data _null_;
if _n_=1 then do;
if 0 then set have;
 dcl hash h(dataset:'have',multidata:'y');
 h.definekey('Weight');
 h.definedata(all:'y');
 h.definedone();
 dcl hash h1(dataset:'have',duplicate:'r');
 h1.definekey('Weight');
 h1.definedata('weight');
 h1.definedone();
 dcl hiter i('h1') ; 
 dcl hash h2(dataset:'have(obs=0)',multidata:'y');
 h2.definekey('Weight');
 h2.definedata(all:'y');
 h2.definedone();
 end;
rc = i.first();
do while (rc = 0);
	h2.clear();
   do while(h.do_over(key:weight) eq 0);
    h2.add();
 	end;
	h2.output(dataset:weight);	
	 rc = i.next();
end;
run;

 

PaigeMiller
Diamond | Level 26

@art297 wrote:

@madpumpkinpie: Just wanted to point out a couple of things. First, I agree with @PaigeMiller, if you're doing the analysis to test hypotheses, the correct method is to use a single analysis.

 

However, so that you gain a better understanding of how SAS works, you could still get exactly what you asked for without having to "split" the data.

 

e.g., @PaigeMiller recommended:

proc glm data=have;
    class weight treatment;
    model kcal=weight|treatment;
run;

To do the same thing, separately for each level of treatment, one could use:

proc sort data=have;
  by treatment;
run;

proc glm data=have;
    class weight;
    by treatment;
    model kcal=weight;
run;


Well I feel that I should point out that this is NOT the same thing. The F-tests will be different; and thus correct if you don't split the analysis using BY groups, and incorrect if you do split the analysis by using BY groups.

--
Paige Miller
art297
Opal | Level 21

@PaigeMiller: Poor choice of words on my part. I didn't mean to imply that the by group analyses were either correct of supplied the same result. I'm well familiar with the effects (particularly on alpha) of doing multiple tests.

 

My post was simply to address the question that @madpumpkinpie originally asked.

 

Conversely, if one isn't doing hypothesis testing, but rather only data snooping (however frowned upon that may be), the approach does exist.

 

Art, CEO, AnalystFinder.com

  

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 12 replies
  • 1785 views
  • 2 likes
  • 4 in conversation