BookmarkSubscribeRSS Feed
Dil
Calcite | Level 5 Dil
Calcite | Level 5

Hi all,

 

I have a question to discuss with you. I have a outcome variable of interest (motor score change) which I want to compare in two groups of patients. When I checked the normality, the KS test says that the motor score change was not normally distributed. I can tell from the histogram that the distribution is skewed to the right (a lot positive numbers than negative). The sample size is pretty large (103 and 108) in each group. 

 

I also checked the homogenity of the variance in two groups using Levene's test, which said the spread of the data were the same between two groups. So, the assumptions for the non-parametric data are all met.  When i used the Wilcoxon's rank sum test (Mann-whitney U-test), there is strong evidence that the change in motor score is different in two groups. 

 

At the meantime, as we have large enough sample sizes in each group, I also tried the T-test/ANOVA (robust against non-normality with large sample size), which says there is no difference in change in motor scores between two groups. 

 

I would go with the non-parametric test results but I also needed to do a multivariate analysis for the change in motor score. We don't currently have non-parametric multivariate models to use. As the response was not normally distributed, I should/counld not use multiple linear regression so with a colleague's suggestion (he had done something similar before on this variable), i recoded the negative scores in the samples (only about 7% of the patients) so that I can use some GLMs. The recoded data is pretty close to the original one as the mean change in motor socre only increased 0.7 unit after recoding. Considering the range of the score, which is 0-100, this can be a really small increase. Plus, we are actually modeling the improvement in motor score, which should be at least 0. 

 

I tried both Poisson regression and Negative Binomial regression and negative binomila regression with the log link fitted pretty well. However, the results from the negative binomial regression, inn both unadjusted and adjusted models, group variable was not a significant predictor, meaning the there is no significant difference between two groups in motor score improvement.

 

So, I am a bit confused as to what to report. The recoded change in motor score using Wilcoxon Rank Sum test is still significant. The clinician I am working with is very happy about the non-parametric result as he wants to see that there is difference between two groups in motor score improvement. However, I feel like there is more evidence that there is no difference. 

 

Can anyone give me some suggestions or comments on this.

 

Thanks a lot.

 

 

4 REPLIES 4
Reeza
Super User

What are the assumptions of regression that you think are being violated?

Dil
Calcite | Level 5 Dil
Calcite | Level 5

Thanks Reeza. The normality assumption was violated for the multiple linear regression. 

plf515
Lapis Lazuli | Level 10

You wrote:

As the response was not normally distributed, I should/counld not use multiple linear regression

 

This is not correct.  Multiple regression makes no assumptions about the distribution of the response. It assumes that the errors are normally distributed. You can't see the errors, but you can see the residuals and the output from e.g. PROC GLM provides ways to assess the normality of the residuals.

 

You also wrote

i recoded the negative scores in the samples (only about 7% of the patients) so that I can use some GLMs.

 

This is surely not a good thing to do. We don't change the data to make it fit our model without very good reason. It's also not necessary here.

 

We don't currently have non-parametric multivariate models to use.

 

Since you have SAS, yes, you do have these. You can try either ROBUSTREG or QUANTREG.  See my paper "Should more of your PROC REGs be QUANTREGs and ROBUSTREGs"

 

Finally, as to your confusion regarding the different results from the different methods - it's not really surprising.  Each of the methods asks a different question so each gives a different answer. 

 

Dil
Calcite | Level 5 Dil
Calcite | Level 5

Hi,

 

Thanks a lot for your useful advice. Sorry. Yes, I meant the residuals were not normally distributed not the response. I did check the residual plots generated from GLM and they were skewed.

 

For the recoding of negative scores, actually, there was more reason for that but was not described fully. I agree that it is not a good idea to change the data in any way to fit our need.

 

For the non-parametric multivariate analysis models, thanks for sharing that paper. I am going to have a look at it. 

 

Thanks,

 

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 2502 views
  • 2 likes
  • 3 in conversation