BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
LucyB
Obsidian | Level 7

I have data where the outcome is a score and there is only 1 group where all participants recieve the intervention. I have pre and post scores and a number of covariates. So this is a paired t-test in itself, but what test should i use if a am looking to control and stratify?

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

You could, but it may be worthwhile to model the difference or change rather than pre/post?

It's definitely worth reading up and seeing what's common in this field. If you're testing one a time it could be an ANOVA as well, which GLM will do anyways 🙂

 

 

proc glm data=data;

     class covariate;

     model diff = covariate;

run;

View solution in original post

14 REPLIES 14
art297
Opal | Level 21

Sounds like GLM or proc mixed. Take a look at: http://www2.sas.com/proceedings/sugi25/25/aa/25p020.pdf

 

Art, CEO, AnalystFinder.com

 

LucyB
Obsidian | Level 7

Thanks for the useful document.

 

I think then this would be a glm since i dont have any random effects. Would my model be as such:

 

proc glm data=data;

     class covariate;

     model post = pre covariate;

run;

 

Reeza
Super User

You could, but it may be worthwhile to model the difference or change rather than pre/post?

It's definitely worth reading up and seeing what's common in this field. If you're testing one a time it could be an ANOVA as well, which GLM will do anyways 🙂

 

 

proc glm data=data;

     class covariate;

     model diff = covariate;

run;

LucyB
Obsidian | Level 7

should we adjust for pre scores in the model?

LucyB
Obsidian | Level 7

sorry i know i have asked this on mutiple occasions on other threads but i feel like i need to account for the variability in the pre scoress somehow.

LucyB
Obsidian | Level 7

proc glm data=data;

     class covariate;

     model diff = covariate pre*covariate;

run;

 

 

Would the interaction term here tell us the difference between covariate having adjusted for pre? If this is even correct.

Ksharp
Super User

I think your original code is right. it is called covariance analysis.

But need SOLUTION option to get that parameter estimates.

 

proc glm data=data;
     class covariate;
     model post = pre covariate /solution;
run;
LucyB
Obsidian | Level 7

Thank you. What about the model change = pre cov pre*cov? The interaction seems to address the question, as the pre score increases, the change from post-pre decreases and this is more apparent in one of the covariate groups. Will this be appropriate?

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12
model change = pre cov pre*cov;

would not be appropriate.

 

You could augment the code provided by @Ksharp as

model post = pre cov pre*cov;

The interaction allows the regression of post on pre to have different slopes for each value of cov.

 

As @Ksharp notes, these models fall under analysis of covariance. You'll want to get up to speed with ANCOVA before you try to make sense of your results; ANCOVA is trickier than it appears on first glance, IMO. See this example in the GLM documentation:

https://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_glm_sect049...

LucyB
Obsidian | Level 7

Thanks for the link!

 

My post scores are not normal, which is partly why I wanted to model the change outcome instead. Is there a nonparametric version of glm?

 

Also, the group variable in the link is the drug, and the patients were probably randomized into the drug categories. My group variable is a characteristic of the study sample (whether or not the subject had prior exposure to the task performed). Will this make a difference? Should I consider a repeated model?

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

(1) Are you looking at the distribution of the post values all together prior to fitting a model, or are you looking at the distribution of the residuals after fitting the model?

 

The normality assumption applies to the response variable y conditional on the predictor variables x. For ANOVA, this means that y is normally distributed within each treatment level; clearly, if you have few replicates within each treatment level, your ability to assess the normality assumption is limited or even nonexistent. For regression, this means that y is normally distributed within each level of x; again, if you don't have lots of replicates within each level of x, you will not be able to assess this assumption prior to fitting a model. What are we to do? We look at the distribution of the residuals.

 

(2) Are post and pre in your study measuring the same variable? 

 

If pre is "whether or not the subject had prior exposure to the task performed", then it sounds like a categorical variable (yes/no), and if so, it is not a continuous covariate as needed for ANCOVA and it should be listed in the CLASS statement.

 

Is post measured on a continuous scale? Is covariate measured on a categorical scale (as I would expect given that it is in the CLASS statement)?

 

I'm beginning to think that you are on the wrong track entirely and that the appropriate model might be something like a two-way factorial ANOVA-like model. But you haven't provided enough information to tell. If you don't provide enough detail about your study design and your variables, you risk getting a correct answer to the wrong question. 

LucyB
Obsidian | Level 7

Thanks for the thorough response!

 

My outcome is continuous (a score); scores are measured pre intervention and post intervention on the same group of patients (n=44). I have no control group. Since a lot of patients did significantly better post intervention, the post distribution is skewed to the left. I used a paired t-test for the analysis of significant improvement (improvement post-pre was normally distributed). I have a variable in my dataset that is binary (yes/no) regarding previous exposure of the patients to the intervention, so I am further interested in seeing if there is differences in this significant improvement level based on levels of this prior exposure. I am really confused on which model above to use to go about this and appreciate your help greatly.

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

That's helpful, thanks. ANCOVA might not be a bad approach after all.

 

When I'm embarking on an ANCOVA, the first thing I do is plot data.

Let post be the post-intervention score, pre be the pre-intervention score, and exposure be the binary "whether prior exposure" variable.

 

Plot post versus pre, distinguishing by exposure, and add a reference line that depicts pre=post:

 

proc sgplot data=have;
  scatter x=pre y=post / group=exposure;
  lineparm x=0 y=0 slope=1;
run;

Things to look for:

  

(1) Are the two exposure scatters sitting on top of each other (implying no effect of exposure), or are they shifted in some way? Up or down, left or right.

 

(2) Are the relationships between post and pre for each exposure group linear? (In its basic form, ANCOVA assumes linearity, as well as normality and homgeneity of variance.)

 

(3) Are the relationships between post and pre for each exposure group parallel to the reference line? If so, then the difference between post and pre does not depend upon the value of pre--the difference is constant. In this case, you could use the model that @Reeza suggested

 

proc glm data=have;
  class exposure;
  model diff = exposure;
  run;

where diff = (post - pre). Essentially, this is your paired t-test with exposure added.

 

(4) If one or both relationships are not parallet to the reference line, then the difference between post and pre depends upon the value of pre, and an analysis of the response diff = (post - pre) would be a non-optimal choice. Perhaps, for example, the difference increases as pre increases. ANCOVA is useful in this scenario.

  

proc glm data=have;
  class exposure;
  model post = pre exposure;
  run;

(5) If the two relationships are not parallel to each other--if the slopes of the two linear regressions are not equal--then add interaction to the model.

  

proc glm data=have;
  class exposure;
  model post = pre exposure pre*exposure;
  run;

 

Chapter 7 in this text deals with ANCOVA and would probably be useful

 https://www.sas.com/store/books/categories/usage-and-reference/sas-for-linear-models-fourth-edition/...

 Note that the covariate (here, pre) is centered in the mathematical model: the mean of X is subtracted from each value of X.

 

Another resource here: 

https://onlinecourses.science.psu.edu/stat502/node/183

LucyB
Obsidian | Level 7
Very very helpful, thank you so much

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 14 replies
  • 6712 views
  • 6 likes
  • 5 in conversation