BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Lysegroentblad
Obsidian | Level 7

Hi,

I have two questions. 

 

1) I have a dataset where neither the errors or the data are normally distributed. I had been using PROC GLM, but discovered that this usually reguires normally distributed errors. However, I also read in some (unofficial) articles that if the sample size is large enough you can still use det procedure. Is this true? If it is true, how large a sample size are we talking? I have two sites with 240 and 960 observations respectively, which I believe to be quite large.
If it is not true that a not normally distributed sample can be used in PROC GLM as long as the sample size is large enough then which procedure do you suggest instead? I tried using PROC GENMOD, but I am not interested in the correlation between two variables. I want to know if the (mean) age in one site differs significantly from (mean) age in the second site.

 

2) I also have a more statistical question.
I collected my data from circles. The data has to do with trees. I logged the age and diameter of each tree in the circle. I then measured the distance to the nearest piece of deadwood for each tree. I wanted to prove that there is a correlation between the age/diameter of the tree and the distance to deadwood. I believed proximity to deadwood stimulate the growth of the trees. However, I was not able to prove this (I used PROC CORR).
My statistics professor told me that the fact that the data is collected from a circle can be at risk of creating false patterns and therefore make it difficult to prove anything (without running simulations which is outside my capability to be honest).
I have attempted to simply illustrate my  data-collection in photo below:
Green is trees. Brown is deadwood. Red is measured distance.Green is trees. Brown is deadwood. Red is measured distance.

 

Best regards
Maja

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

1) For nonparametric univariate tests, look at proc NPAR1WAY. The Wilcoxon rank-sum test for example is based on ranks and does not assume normality.

PG

View solution in original post

4 REPLIES 4
ballardw
Super User

As a bare minimum I would suggest sharing the Proc GLM code you are attempting.

Better would be to include the LOG from running the procedure. Copy the text from the log including the procedure code and any messages associated, then on the forum open a text box using the </> icon above the main message window and paste the text.

The entire code helps see what potential issues may arise from your option choices. The log helps if there are thing like record exclusions happening or some other data issues.

 

Best would be include some example data in the form of a data step so we can see what type of things you have.

Measurement units may be a good idea as well.

 

You picture doesn't help a great deal as there is not much of a description as to why/how that circle is selected. Distance from each individual tree to deadwood shouldn't be affected by that circle though I can see some possible modeling with multiple trees being equidistant to the same "deadwood".

PGStats
Opal | Level 21

1) For nonparametric univariate tests, look at proc NPAR1WAY. The Wilcoxon rank-sum test for example is based on ranks and does not assume normality.

PG
PGStats
Opal | Level 21

2) If I understand correctly, the nearest deadwood is the one found inside the circle, but it could also be closer, lying just outside the circle. So what you have is interval-censored data. The true shortest distance lies in the interval (nearest distance to the circle, measured distance). Ask your professor about the appropriateness of censored data analysis for your data.

PG
Lysegroentblad
Obsidian | Level 7
Hi @PGStats ,
Thank you so much. Wilcoxon was exactly what I was looking for.
And also, thank you suggesting my second question had to do with interval-censored data. I think this is exactly what my professor meant.

MJ

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 658 views
  • 1 like
  • 3 in conversation