Hi I have a data set that consists of a control site and a planted site (it has to do with trees). The data is from a windthrown area so there are trees that are both 84 years old (that did not go over in the storm) and trees that are 1 year old. I do actually split up the data set in regeneration (0-16 years old) and survivors (17-84 years old). Despite seperating them into two categories, both categories are not normally distributed. The standard deviations on my means are more often than not over 100 percent. The control site has approximately 900 observations and the planted site has approximately 200 observations. I want to test a H0 that age, height, root-diameter and diameter at breast height are the same within the two sites (control and planted - i.e. is the mean of height 6,04 in the control significantly different to a mean height of 8,08 in the planted. So far I have tried using PROC GLM and a TTEST, but I have begun questioning whether the p-value is reliable when the data is not normally distributed. This is how my results appear when doing a t-test: ( proc ttest data = Thesis alpha = 0.05 h0 = 0; where type='R'; /*regeneration*/ class site; var age height dbh root; run; ) As you can see I get quite a long tail, especially in the control (NI). This is how my results appear when I do PROC GLM: ( PROC GLM; CLASS site; MODEL age height dbh root = site / SS3; where type='R'; RUN; ) Still a long tail of sheit. The two p-values are quite similar - and completely the same if you look at "pooled" (0,0027) - and both significant. I think my final question is if I can trust these p-values when the data is not normally distributed and in the case that I can trust these test, which one is better to use. Or should I use a complete different test? Best regards Maja
... View more