BookmarkSubscribeRSS Feed
deleted_user
Not applicable
Hi folks,

I'm really hoping someone can help me with this one, because I've been wracking my brain for about 3 days now. I'm trying to compare two variables (Tx1, Tx2; continuous data) within trials to see if they are different. There are two other variables that play into each trial, Subject and Donor. The Subject and Donor were not randomly chosen, and as such, I need to correct for that because it violates the assumption of nonindependence. At first I thought I should use SAS MIXED procedure with two random effects (subject and donor). Not only am I not sure that I'm setting up the data and model correctly for that type of analysis, but I'm also worried because the residuals are not normally distributed (which is a requirement of the model.) I also gave SAS GLIMMIX procedure a try, but I didn't have any success there. The problem with this data is that it doesn't fit any distribution. I've tried transforming it to no avail. It is very skewed.

In short, when I do a nonparametric matched pairs test (Wilcoxon rank-sum test), I get significant results. That's great, except that I'm worried about going forward with those results without correcting for the violations of independence. So, does anyone know how I can do a nonparametric matched pairs test comparing Tx1 to Tx2 while taking into account the nonindependence of the subject and donor?

Thanks in advance for any suggestions, and here is a little visualization of the layout I have the data in...

Trial Subject Donor Tx1 Tx2
1 Sub1 Don1 # #
2 Sub1 Don2 # #
3 Sub2 Don1 # #
4 Sub3 Don2 # #
5 Sub3 Don3 # #
6 Sub3 Don4 # #
etc...
5 REPLIES 5
deleted_user
Not applicable
I think the non-parametric Wilcoxon rank sum test is equivalent to the two-sample t-test that requires observations from independent samples whereas the Wilcoxon signed rank test (matched pairs) is equivalent to paired t-test where the observations are related. In your case, since the observations are somehow related or are non-independent and data are not normally distributed, a Wilcoxon signed rank test that is based on differences of observations seems appropriate.
deleted_user
Not applicable
Thanks for the response, sas_grad. I do not think that the Wilcoxon signed rank test will work, however. That test comes with two assumptions. One of them is that the paired differences are independent, and in my case they are not. Perhaps I didn't explain this well enough in my first post... Yes, the samples are related, and in that way, this test seems to fit. However, I need a way to correct for the fact that I violate the assumption of independence. So, either another test altogether, or some correction factor to use after using a Wilcoxon. Basically, I need to do a SAS MIXED model with two random effects, except I can't because my residuals are not normally distributed.
Doc_Duke
Rhodochrosite | Level 12
You could always go back to the principles applied by Wilcoxon in the 1940's and rank your data and then rely on the Central Limit Theorem for the normality of the means of the ranks and then use MIXED. The point estimates won't make any sense, but the tests will be valid if yo have a large enough sample size.
deleted_user
Not applicable
Hi Doc@Duke. Thanks for that idea... I'm not sure I completely follow (I'm in many ways a statistical novice), but hopefully I can figure it out. I have a sample size of 34. Not sure if that'll be enough. Anyway, I happen to be at Duke. If you'd be willing to meet with me to discuss this, I'll buy you lunch or something. Just shoot me an e-mail.

Cheers! Message was edited by: JeremyCFD
Doc_Duke
Rhodochrosite | Level 12
N=34 is a bit small, but you could bootstrap your analysis to get handle on the stability of the test.

Small world. I'll contact you offline about the nuts and bolts of doing this.

Doc

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 4010 views
  • 0 likes
  • 2 in conversation