Hi everyone,
I'm looking for help deciding which statistical procedure I should use to analyse the following data:
Independent variables:
- GROUP: control & intervention
- TIME: baseline & end
Dependent variable DV is continuous, with Shapiro Wilk <0.0001.
Due to this latest test, I think I should use a nonparametric test of a two way anova, thinking about Scheirer-Ray-Hare.
However
1) Is this test possible in SAS? And if so, can someone help me with the code and interpretation?
2) Or do I interpret the Shapiro Wilk test wrong and can I continue using the two way anova?
3) I also want to add a covariate in my analysis. However I don't know if this is possible in the suggested test (anova or Scheirer or another test I'm not thinking about).
Thanks in advance for your help.
ANOVA does not require normally distributed response variable. It requires normally distributed errors, which you can check by fitting the model and seeing if the residuals are normally distributed.
ANOVA does not require normally distributed response variable. It requires normally distributed errors, which you can check by fitting the model and seeing if the residuals are normally distributed.
@Sofie3 wrote:
Thank you PaigeMiller for the correction of my misconception.
I would like to fit the model, however with 2 independent categorical variables, it seems difficult to do?
Should not be difficult. All SAS modeling programs allow two independent categorical variables. You have already provided code that works.
@PaigeMiller I used this code:
ods graphics on;
proc reg data = ild.all; *p-waarde sign = niet lineair;
model mean_steps = intervention;
plot mean_steps*intervention;
plot r. *p.;
run;
Seeing the following output:
For me, it unclear if the errors are normally distributed or not. R² = 0.0278, meaning that they are not?
And if so, then I have to look for another analytical test?
Not the residuals from some regression. You need the residuals from your PROC GLIMMIX with two categorical variables. You plot the distribution of the residuals, not the residuals against one of the x-variables.
As @PaigeMiller posted (and citing Rick's excellent blog post), the normality condition is on the residuals, not the dependent variable. You can run MIXED and check the plot of your residuals to see if they appear normal "enough" to you.
Thank you @StatsMan!
Seeing this output, i would suggest that the residuals are 'normal', right? (I looked at the right upper graph)
There is a bit of a right tail, as shown in the histogram and at the upper right part of the QQ plot, but definitely not enough to make the assumption of normality of the residuals untrue. One thing I do see is that it looks like one of your model factors has three levels, rather than the two you mentioned in the original post.
So here is a way to analyze this using time as a repeated factor at the residual level, using PROC GLIMMIX:
proc glimmix data=ild.all;
class intervention visit study_id;
model mean_steps =intervention visit intervention*visit;
random visit /residual type=un subject=study_id;
run;
Adding a covariate is then easy to do by adding the variable to the MODEL statement. If the covariate is categorical, you will also need to add it to the CLASS statement.
SteveDenham
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.