Hi SAS friends! Hoping for some advice (and new ideas) here...
I am trying to use ANOVA to evaluate the relationship between an independent categorical variable with multiple levels and a dependent continuous variable.
I used PROC GLM to conduct my test and also requested some nonparametric test options and tests for unequal variance (Levene's, Welch's ANOVA). The distribution of my dependent variable is heavily skewed.
Here's my original code:
ods graphics on;
proc glm data = mydata plots(maxpoints=none)=diagnostics;
class independent;
model dependent = independent;
means independent/hovtest welch;
run;
ods graphics off;
Then, I realized that since the survey design includes weighting and stratification variables that I needed to take those into account. PROC GLM allowed me to add the weighting variable but doesn't appear to have options for nonparametric tests. I switched to PROC SURVEYREG which allowed for the inclusion of both weighting and stratification variables but still no test options beyond the initial ANOVA.
Here's my amended code:
proc surveyreg data = mydata;
weight weightvar;
strata stratavar;
model dependent = independent / anova;
run;
Should I be using a different PROC? A totally different test? Is there an option that I'm missing in SURVEYREG? Help!
The data can be skewed, this isn't a problem for GLM or SURVEYREG. The actual condition required is that the residuals (the difference between predicted and actual values) are normally distributed. You can examine the residuals and see if they follow a normal distribution or not.
Assuming the residuals are normally distributed, I would think that SURVEYREG would handle the weighting properly.
Thanks for the reply!! 🙂
I checked and unfortunately the residuals are also heavily skewed. I'm thinking that maybe I'm just not looking at this correctly and need to adjust which test I'm using/my research question?
Can you show us a screen capture of the residual plot?
If they are skewed, perhaps a transformation of the data would help (depending on the severity of the skewing) to achieve the normal distribution of the residuals.
Here are all of the diagnostic plots (let me know if this is what you meant!). Thank you so much for your help!
Obviously, the residuals are not normally distributed, and its not obvious to me that you can transform the data to make them normal. So, I would then consider non-parametric methods, although I'm not sure how the survey weights would apply.
I agree with what @PaigeMiller has said. The residuals of your model is clearly non-normal. Both ANOVA and linear regression are not applicable.
My suggestions on your problem is that nonparametric tests and quantile regression can be employed for inter-group comparisons of quantiles instead of means, a statistic that ANOVA aims to compare among groups.
For instance, Median and quantile tests under complex survey design using SAS and R - ScienceDirect contains a SAS program for testing for the equality of medians and other quantiles among groups. Quantile Regression Analysis of Survey Data Under Informative Sampling | Journal of Survey Statistic... and other research papers discuss quantile regression of complex survey data.
There is a note on the ANOVA option in the SURVEYREG procedure. As page 104 of Complex Survey Data Analysis with SAS | Taylor H. Lewis | Taylor & Francis says, this option should be deprecated in analysis in that the F statistic in the ANOVA table reported by PROC SURVEYREG does not carry the same interpretation as it has for analysis of simple random sampling data and should be ignored. In other words, the F statistic reported in the ANOVA table does not represent the test statistic for the null hypothesis that all parameters are jointly equal to zero except for the intercept. Instead, it adviced by the author that analysts should refer to the line of "Model" in the table entitled "Tests of Model Effects" in the output produced by the SURVEYREG procedure for the correct statistic.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.