BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
vitormacedo
Calcite | Level 5

Dear all.

 

How do I know how to best transform my data so that I have residual normally distributed?

I am fitting the following model to my data:

PROC MIXED DATA=AU;
CLASS LOCAL ESPE REP ESTA;
MODEL Y6=LOCAL|ESPE|ESTA /HTYPE=3 RESIDUAL INFLUENCE;
RANDOM REP(LOCAL*ESPE);
REPEATED ESTA/TYPE=TOEP(2) SUB=REP(LOCAL*ESPE) GROUP=LOCAL*ESPECIE;
ODS OUTPUT "Influence Diagnostics"=DIAGNOSTIC; RUN;

However, the residual did not have a normal distribution as a result below:

DIAGNOSTIC.jpgnormal.jpg

Is there any procedure to know how to transform the data? I think Box-Cox transformation is not a good idea, because my response variable is categorical factors (LOCAL, ESPE, ESTA), not continuous values.

I appreciate all the help. Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

If your response is a proportion, you need to account for that. If you Google "sas regression proportion" you'll find many references and papers. Two from SAS include

http://support.sas.com/kb/57/480.html

http://support.sas.com/kb/56/992.html

Since you might have an inflated number of zeros, you can also search for papers about zero-inflated models.

View solution in original post

4 REPLIES 4
Ksharp
Super User

You could check ASSESS statement of PROC GENMOD , 

But I don't know if it could handle category variable ,since you claim your variable is category .

Rick_SAS
SAS Super FREQ

I don't think there is any transformation that will result in normal residuals. You can see from the Q-Q plot that about half of the data have zero residuals. This can happen if, for example, the Y6 variable is rounded so that it has a small number of discrete values. 

 

It might be that Y6 has a large number of repeated values. You can check the distribution of the Y6 variable by running 

proc freq data=AU ORDER=FREQ;
  tables Y6 / maxlevels=10;   /* only print top 10 */
run;

For details see https://blogs.sas.com/content/iml/2018/06/04/top-10-table-bar-chart.html

 

 

vitormacedo
Calcite | Level 5

My variable response represents proportion of plant parts. Y6 is the proportion of flowers, which in many observations is zero.

Should I then work with the GLIMMIX procedure instead of transforming the data?

My goal is to test the difference between the factors (LOCAL, ESPE, ESTA), and the effect of interactions. What's command should I use?

 

Thank you!

Rick_SAS
SAS Super FREQ

If your response is a proportion, you need to account for that. If you Google "sas regression proportion" you'll find many references and papers. Two from SAS include

http://support.sas.com/kb/57/480.html

http://support.sas.com/kb/56/992.html

Since you might have an inflated number of zeros, you can also search for papers about zero-inflated models.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 484 views
  • 0 likes
  • 3 in conversation