Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Parametric test vs non-parametric test!

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 03-01-2016 07:18 PM
(2606 views)

Hi all,

I have a question to discuss with you. I have a outcome variable of interest (motor score change) which I want to compare in two groups of patients. When I checked the normality, the KS test says that the motor score change was not normally distributed. I can tell from the histogram that the distribution is skewed to the right (a lot positive numbers than negative). The sample size is pretty large (103 and 108) in each group.

I also checked the homogenity of the variance in two groups using Levene's test, which said the spread of the data were the same between two groups. So, the assumptions for the non-parametric data are all met. When i used the Wilcoxon's rank sum test (Mann-whitney U-test), there is strong evidence that the change in motor score is different in two groups.

At the meantime, as we have large enough sample sizes in each group, I also tried the T-test/ANOVA (robust against non-normality with large sample size), which says there is no difference in change in motor scores between two groups.

I would go with the non-parametric test results but I also needed to do a multivariate analysis for the change in motor score. We don't currently have non-parametric multivariate models to use. As the response was not normally distributed, I should/counld not use multiple linear regression so with a colleague's suggestion (he had done something similar before on this variable), i recoded the negative scores in the samples (only about 7% of the patients) so that I can use some GLMs. The recoded data is pretty close to the original one as the mean change in motor socre only increased 0.7 unit after recoding. Considering the range of the score, which is 0-100, this can be a really small increase. Plus, we are actually modeling the improvement in motor score, which should be at least 0.

I tried both Poisson regression and Negative Binomial regression and negative binomila regression with the log link fitted pretty well. However, the results from the negative binomial regression, inn both unadjusted and adjusted models, group variable was not a significant predictor, meaning the there is no significant difference between two groups in motor score improvement.

So, I am a bit confused as to what to report. The recoded change in motor score using Wilcoxon Rank Sum test is still significant. The clinician I am working with is very happy about the non-parametric result as he wants to see that there is difference between two groups in motor score improvement. However, I feel like there is more evidence that there is no difference.

Can anyone give me some suggestions or comments on this.

Thanks a lot.

4 REPLIES 4

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

What are the assumptions of regression that you think are being violated?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thanks Reeza. The normality assumption was violated for the multiple linear regression.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

You wrote:

**As the response was not normally distributed, I should/counld not use multiple linear regression**

This is not correct. Multiple regression makes no assumptions about the distribution of the response. It assumes that the *errors* are normally distributed. You can't see the errors, but you can see the residuals and the output from e.g. PROC GLM provides ways to assess the normality of the residuals.

You also wrote

**i recoded the negative scores in the samples (only about 7% of the patients) so that I can use some GLMs.**

This is surely not a good thing to do. We don't change the data to make it fit our model without very good reason. It's also not necessary here.

**We don't currently have non-parametric multivariate models to use.**

Since you have SAS, yes, you do have these. You can try either ROBUSTREG or QUANTREG. See my paper "Should more of your PROC REGs be QUANTREGs and ROBUSTREGs"

Finally, as to your confusion regarding the different results from the different methods - it's not really surprising. Each of the methods asks a different question so each gives a different answer.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi,

Thanks a lot for your useful advice. Sorry. Yes, I meant the residuals were not normally distributed not the response. I did check the residual plots generated from GLM and they were skewed.

For the recoding of negative scores, actually, there was more reason for that but was not described fully. I agree that it is not a good idea to change the data in any way to fit our need.

For the non-parametric multivariate analysis models, thanks for sharing that paper. I am going to have a look at it.

Thanks,

**Available on demand!**

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.