- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi
I have a data set that consists of a control site and a planted site (it has to do with trees). The data is from a windthrown area so there are trees that are both 84 years old (that did not go over in the storm) and trees that are 1 year old. I do actually split up the data set in regeneration (0-16 years old) and survivors (17-84 years old). Despite seperating them into two categories, both categories are not normally distributed. The standard deviations on my means are more often than not over 100 percent.
The control site has approximately 900 observations and the planted site has approximately 200 observations.
I want to test a H0 that age, height, root-diameter and diameter at breast height are the same within the two sites (control and planted - i.e. is the mean of height 6,04 in the control significantly different to a mean height of 8,08 in the planted.
So far I have tried using PROC GLM and a TTEST, but I have begun questioning whether the p-value is reliable when the data is not normally distributed. This is how my results appear when doing a t-test:
( proc ttest data = Thesis alpha = 0.05 h0 = 0;
where type='R'; /*regeneration*/
class site;
var age height dbh root;
run; )
As you can see I get quite a long tail, especially in the control (NI).
This is how my results appear when I do PROC GLM:
( PROC GLM;
CLASS site;
MODEL age height dbh root = site / SS3; where type='R'; RUN; )
Still a long tail of sheit.
The two p-values are quite similar - and completely the same if you look at "pooled" (0,0027) - and both significant.
I think my final question is if I can trust these p-values when the data is not normally distributed and in the case that I can trust these test, which one is better to use. Or should I use a complete different test?
Best regards
Maja
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If you really care Normal distribution, why not using non-parameter method like Wilcoxon Test by proc npar1way ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You might want to use a response distribution that is more appropriate for positive, skewed data such as the gamma. You can do that with PROC GENMOD. To do that, change GLM to GENMOD and specify the DIST=GAMMA option in the MODEL statement.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
There is no requirement for PROC GLM to have normally distributed data. The errors (residuals) must be normally distributed, you should check that. See: https://blogs.sas.com/content/iml/2018/08/27/on-the-assumptions-and-misconceptions-of-linear-regress...
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @PaigeMiller
Thank you for responding.
I checked for normality of my residuals (I think) using PROC REG.
These are my graphs:
I know the plots are ideally supposed to be scattered like a cloud, and even more ideally also centered around zero - no pattern should emerge.
I do feel like patterns emerge in these graphs.
Would it be a correct assumption that my residuals are not normally distributed? And in that case, which test should I use instead?
Best regards
Maja
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If you really care Normal distribution, why not using non-parameter method like Wilcoxon Test by proc npar1way ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi
Thank you for your answers.
I might be overthinking the fact that my data is not normally distributed, but I will consider doing the Wilcoxon test. That seems like a winner. Thank you @Ksharp and others for response.
Best regards
Maja