Dear all.
How do I know how to best transform my data so that I have residual normally distributed?
I am fitting the following model to my data:
PROC MIXED DATA=AU;
CLASS LOCAL ESPE REP ESTA;
MODEL Y6=LOCAL|ESPE|ESTA /HTYPE=3 RESIDUAL INFLUENCE;
RANDOM REP(LOCAL*ESPE);
REPEATED ESTA/TYPE=TOEP(2) SUB=REP(LOCAL*ESPE) GROUP=LOCAL*ESPECIE;
ODS OUTPUT "Influence Diagnostics"=DIAGNOSTIC;
RUN;
However, the residual did not have a normal distribution as a result below:
Is there any procedure to know how to transform the data? I think Box-Cox transformation is not a good idea, because my response variable is categorical factors (LOCAL, ESPE, ESTA), not continuous values.
I appreciate all the help. Thanks.
If your response is a proportion, you need to account for that. If you Google "sas regression proportion" you'll find many references and papers. Two from SAS include
http://support.sas.com/kb/57/480.html
http://support.sas.com/kb/56/992.html
Since you might have an inflated number of zeros, you can also search for papers about zero-inflated models.
You could check ASSESS statement of PROC GENMOD ,
But I don't know if it could handle category variable ,since you claim your variable is category .
I don't think there is any transformation that will result in normal residuals. You can see from the Q-Q plot that about half of the data have zero residuals. This can happen if, for example, the Y6 variable is rounded so that it has a small number of discrete values.
It might be that Y6 has a large number of repeated values. You can check the distribution of the Y6 variable by running
proc freq data=AU ORDER=FREQ; tables Y6 / maxlevels=10; /* only print top 10 */ run;
For details see https://blogs.sas.com/content/iml/2018/06/04/top-10-table-bar-chart.html
My variable response represents proportion of plant parts. Y6 is the proportion of flowers, which in many observations is zero.
Should I then work with the GLIMMIX procedure instead of transforming the data?
My goal is to test the difference between the factors (LOCAL, ESPE, ESTA), and the effect of interactions. What's command should I use?
Thank you!
If your response is a proportion, you need to account for that. If you Google "sas regression proportion" you'll find many references and papers. Two from SAS include
http://support.sas.com/kb/57/480.html
http://support.sas.com/kb/56/992.html
Since you might have an inflated number of zeros, you can also search for papers about zero-inflated models.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.