BookmarkSubscribeRSS Feed
San123
Calcite | Level 5

I am not sure what test to use to compare continuous variable which is not nominal. ttest doesnt work as it only works on nominal data. I know npar1way wilcoxon works , but how to incorporate

strata, cluster and weight of the data-sample? I am talking about complex data like NIS (national inpatient sample) or NHANES. I will really appreciate the help!

7 REPLIES 7
Ksharp
Super User

"but how to incorporate strata, cluster and weight of the data-sample?"

You are talking about ANOVA analysis ,  two levels for ttest or wilcoxon , multi-levels for ANOVA , otherwise you are going to make multi-compare error.

San123
Calcite | Level 5

HI Xia,

let me clearify further.

i am creating 2 samples. And I am trying to compare the difference in mean age between these 2

samples. i found out that distribution of age is not nominla( as per the histogram from UNIVARIATE).

So ttest doesnt work. I think Npar1way Wilcoxon will work. But I am not sure how to use it since the data is a complex survey data with strata, cluster and weights. So should we incorporate them as other commands like Surveyfreq, surveylogistic or surveyreg (but it npar1way doesnt allow that though) or just do ignoring the complxities of the data?

ballardw
Super User

Are interested in testing the hypothesis that the mean of your data is different from a reference value or are you worried about your distribution being different?

Any is your data a complex sample or just the NHANES or NIS?

SteveDenham
Jade | Level 19

To use a nonparametric approach to a well-behaved endpoint such as age is like trying to bicycle around the planet.  It can be done, but there are much better ways to travel.  I would urge you to look at SURVEYREG, if you wish to compare more than two groups, or SURVEYMEANS for two-group or complex sampling designs.

Steve Denham

San123
Calcite | Level 5

STeve: you are right. I did talk with my statistician in my institution.

She also suggested using Surveyreg.

Its very hard to use non-parametric tests in complex survey data.

Season
Barite | Level 11

Please be aware that the assumption of normality of the residuals is also required for linear regression of complex survey data. See page 2 of Linear regression diagnostics for survey data for more details. So in theory, a nonparametric test is required.

Season
Barite | Level 11

I wonder how your project has been going on right now and whether you still need this piece of information right now. Still, I will provide my advice on the problem you encountered.


@San123 wrote:

HI Xia,

let me clearify further.

i am creating 2 samples. And I am trying to compare the difference in mean age between these 2

samples. i found out that distribution of age is not nominla( as per the histogram from UNIVARIATE).

So ttest doesnt work. I think Npar1way Wilcoxon will work.


I guess you were trying to say that the distribution of age was not normal, which necessitated a complex survey data version of the Wilcoxon sum-of-rank test. Till now, there is no built-in module for nonparametric tests of complex survey data like "PROC SURVEYNPAR1WAY". However, the extensions of Wilcoxon sum-of-rank test to complex survey data has been made available by statisticians, even at as early as the time you raised this question. They are simply obscure (i.e., few people know their presence).

Your intuition that "ttest dosen't work" reflects your good understanding on the basic assumption of the test. However, as Two-sample rank tests under complex sampling | Biometrika | Oxford Academic tells you, the complex survey data version of the Wilcoxon sum-of-rank test actually can be reduced to a test of domain means (i.e., difference in means of the two groups you want to compare against each other) that uses a t-statistic for hypothesis testing. In other words, the complex survey data version of the Wilcoxon sum-of-rank test actually can be reduced to a t-test! Despite being counter-intuitive, the authors of this article provided proof in this paper. So feel safe to use it.

Despite the lack of "PROC SURVEYNPAR1WAY" in SAS, you can easily conduct the very specific test via the SURVEYMEANS procedure, like this:

proc surveymeans data=aaa;
var x;
strata a;
cluster b;
weight c;
domain group/diff;
run;

I made up variable names and dataset name in the code. Replace them by real variable names and dataset name on your own. Watch out for the degrees of freedom that SAS uses to conduct this t-test as the correct degrees of freedom for this test, as mentioned in the paper I cited, equals the number of primary sampling units minus the number of strata. The degrees of freedom SAS uses may be incorrect if your are using replication methods to obtain the variance. So correct that if necessary.

By the way, there is another group of researchers who generalized the Wilcoxon sum-of-rank test to complex survey data in a different way. Extending the Mann‐Whitney‐Wilcoxon rank sum test to survey data for comparing mean ranks - Lin - 20... contains their results. I have not read this paper thoroughly so I cannot make such clear clarification as the one I did for the previous research paper.

By the way, there is an even older research paper that extends the Wilcoxon sum-of-rank test to complex survey data in the special condition that the variable to be compared is nominal instead of continuous. So this generalization does not fit your problem. But in case somebody else needs it, I will paste the link here: Extension of the Wilcoxon Rank Sum Test for Complex Sample Survey Data | Journal of the Royal Statis.... Implementation of this method essentially entails building a cumulative odds logistic regression model.

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 4071 views
  • 3 likes
  • 5 in conversation