BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Lysegroentblad
Obsidian | Level 7

Hi

 

I have a data set that consists of a control site and a planted site (it has to do with trees). The data is from a windthrown area so there are trees that are both 84 years old (that did not go over in the storm) and trees that are 1 year old. I do actually split up the data set in regeneration (0-16 years old) and survivors (17-84 years old). Despite seperating them into two categories, both categories are not normally distributed. The standard deviations on my means are more often than not over 100 percent.
The control site has approximately 900 observations and the planted site has approximately 200 observations.

I want to test a H0 that age, height, root-diameter and diameter at breast height are the same within the two sites (control and planted - i.e. is the mean of height 6,04 in the control significantly different to a mean height of 8,08 in the planted.

So far I have tried using PROC GLM and a TTEST, but I have begun questioning whether the p-value is reliable when the data is not normally distributed. This is how my results appear when doing a t-test:

(   proc ttest data = Thesis alpha = 0.05 h0 = 0;
where type='R'; /*regeneration*/
class site;
var age height dbh root;
run;   )

Lysegroentblad_0-1651066687585.png

Lysegroentblad_1-1651066734324.png

As you can see I get quite a long tail, especially in the control (NI). 

This is how my results appear when I do PROC GLM:

(    PROC GLM;
CLASS site;
MODEL age height dbh root = site / SS3; where type='R'; RUN;    )

Lysegroentblad_2-1651067013331.png

Still a long tail of sheit.

The two p-values are quite similar - and completely the same if you look at "pooled" (0,0027) - and both significant.

I think my final question is if I can trust these p-values when the data is not normally distributed and in the case that I can trust these test, which one is better to use. Or should I use a complete different test?

 

Best regards

Maja

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
Yes. You can trust TTEST . TTEST is robust for non-normal data or outliers data.
If you really care Normal distribution, why not using non-parameter method like Wilcoxon Test by proc npar1way ?

View solution in original post

5 REPLIES 5
StatDave
SAS Super FREQ

You might want to use a response distribution that is more appropriate for positive, skewed data such as the gamma. You can do that with PROC GENMOD. To do that, change GLM to GENMOD and specify the DIST=GAMMA option in the MODEL statement.

PaigeMiller
Diamond | Level 26

There is no requirement for PROC GLM to have normally distributed data. The errors (residuals) must be normally distributed, you should check that. See: https://blogs.sas.com/content/iml/2018/08/27/on-the-assumptions-and-misconceptions-of-linear-regress...

--
Paige Miller
Lysegroentblad
Obsidian | Level 7

Hi @PaigeMiller 

Thank you for responding.

 

I checked for normality of my residuals (I think) using PROC REG.

These are my graphs:

Lysegroentblad_0-1651137013997.png

I know the plots are ideally supposed to be scattered like a cloud, and even more ideally also centered around zero - no pattern should emerge.

I do feel like patterns emerge in these graphs.

Would it be a correct assumption that my residuals are not normally distributed? And in that case, which test should I use instead?

 

Best regards

 

Maja

 

Ksharp
Super User
Yes. You can trust TTEST . TTEST is robust for non-normal data or outliers data.
If you really care Normal distribution, why not using non-parameter method like Wilcoxon Test by proc npar1way ?
Lysegroentblad
Obsidian | Level 7

Hi 

 

Thank you for your answers. 

I might be overthinking the fact that my data is not normally distributed, but I will consider doing the Wilcoxon test. That seems like a winner. Thank you @Ksharp and others for response.

 

Best regards

Maja

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1242 views
  • 7 likes
  • 4 in conversation