Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Alternative to PROC GLM for not normally distributed data + circle dat...

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

☑ This topic is **solved**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 08-31-2022 11:23 AM
(453 views)

Hi,

I have two questions.

1) I have a dataset where neither the errors or the data are normally distributed. I had been using PROC GLM, but discovered that this usually reguires normally distributed errors. However, I also read in some (unofficial) articles that if the sample size is large enough you can still use det procedure. Is this true? If it is true, how large a sample size are we talking? I have two sites with 240 and 960 observations respectively, which I believe to be quite large.

If it is not true that a not normally distributed sample can be used in PROC GLM as long as the sample size is large enough then which procedure do you suggest instead? I tried using PROC GENMOD, but I am not interested in the correlation between two variables. I want to know if the (mean) age in one site differs significantly from (mean) age in the second site.

2) I also have a more statistical question.

I collected my data from circles. The data has to do with trees. I logged the age and diameter of each tree in the circle. I then measured the distance to the nearest piece of deadwood for each tree. I wanted to prove that there is a correlation between the age/diameter of the tree and the distance to deadwood. I believed proximity to deadwood stimulate the growth of the trees. However, I was not able to prove this (I used PROC CORR).

My statistics professor told me that the fact that the data is collected from a circle can be at risk of creating false patterns and therefore make it difficult to prove anything (without running simulations which is outside my capability to be honest).

I have attempted to simply illustrate my data-collection in photo below:

Best regards

Maja

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

1) For nonparametric univariate tests, look at **proc NPAR1WAY**. The Wilcoxon rank-sum test for example is based on ranks and does not assume normality.

PG

4 REPLIES 4

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

As a bare minimum I would suggest sharing the Proc GLM code you are attempting.

Better would be to include the LOG from running the procedure. Copy the text from the log including the procedure code and any messages associated, then on the forum open a text box using the </> icon above the main message window and paste the text.

The entire code helps see what potential issues may arise from your option choices. The log helps if there are thing like record exclusions happening or some other data issues.

Best would be include some example data in the form of a data step so we can see what type of things you have.

Measurement units may be a good idea as well.

You picture doesn't help a great deal as there is not much of a description as to why/how that circle is selected. Distance from each individual tree to deadwood shouldn't be affected by that circle though I can see some possible modeling with multiple trees being equidistant to the same "deadwood".

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

1) For nonparametric univariate tests, look at **proc NPAR1WAY**. The Wilcoxon rank-sum test for example is based on ranks and does not assume normality.

PG

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

2) If I understand correctly, the nearest deadwood is the one found inside the circle, but it could also be closer, lying just outside the circle. So what you have is interval-censored data. The true shortest distance lies in the interval (nearest distance to the circle, measured distance). Ask your professor about the appropriateness of censored data analysis for your data.

PG

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi @PGStats ,

Thank you so much. Wilcoxon was exactly what I was looking for.

And also, thank you suggesting my second question had to do with interval-censored data. I think this is exactly what my professor meant.

MJ

Thank you so much. Wilcoxon was exactly what I was looking for.

And also, thank you suggesting my second question had to do with interval-censored data. I think this is exactly what my professor meant.

MJ

**Available on demand!**

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.