Solved: Re: Pairwise comparison following a second-order Rao-Scott chi-square ...

scottyoung77 · Posted 10-15-2024 12:41 AM

For the life of me, I can't figure out how to perform pairwise comparisons in SAS following a second-order Rao-Scott chi-square test of independence. I have two categorical variables (NSEX & DEPLVL). NSEX has two levels (1 = male; 2 = female) and DEPLVL has four levels (0 = No symptoms; 1 = 1 symptom; 2 = 2-4 symptoms; 3 = 5-9 symptoms). This is complex survey data with clustering, stratification, and weighting, so I'm limited to procedures which can incorporate these elements/variables: varmethod=taylor, weight audweight, strata varstrat & cluster varunit.

I want to know if the percentage of women with:

no symptoms is significantly different from the percentage of women with 1 symptom
no symptoms is significantly different from the percentage of women with 2-4 symptoms
no symptoms is significantly different from the percentage of women with 5-9 symptoms and
2-4 symptoms is significantly different from the percentage of women with 5-9 symptoms

I also want to know if there are significant differences among men for the same set of DEPLVL comparisons. I've attached the results of the overall test of independence in the hope that it clarifies my research questions/the answers I'm trying to get at (e.g., for scenario 1 above, I want to know if 45.5227 is significantly different from 46.3432).

I'd like to use either the Rao-Scott adjusted Wald test or adjusted F-test for the pairwise comparison (for methodological consistency) with the Holm-Bonferroni multiple comparison adjustment method, but at this point, I'll take anything that gives me adjusted p-values for each of the pairs. I'm trying to avoid the regression route if possible, but I'm just about ready to accept that if it's the only way to get it done.

I should note that I'm brand new to SAS and have been trying to use an AI engine to generate code for the pairwise comparisons, so this is definitely not my area of expertise. Nor is statistics to be honest - I'm writing a dissertation for clinical psych.

Can anyone lend a hand and help me figure this out? I've been trying to figure it out for 3 weeks now and don't feel like I've made any real progress on what (I would have thought) was a pretty straightforward question/procedure.

SteveDenham · Posted 10-15-2024 10:17 AM

Well, there is a "brute force" method. After running PROC SURVEYFREQ with all of the DEPLVL's included, use a WHERE= option on the DATA= part of the SURVEYFREQ statement.

You would then have 5 calls to SURVEYFREQ. The first would use this:

proc surveyfreq data=yourdata;
/*remaining code to get what you already have*/
run;

proc surveyfreq data=yourdata(where=(deplvl in (0,1));
/*same code as before */
run;
proc surveyfreq data=yourdata(where=(deplvl in (0,2));
/*same code as before */
run;
proc surveyfreq data=yourdata(where=(deplvl in (0,3));
/*same code as before */
run;
proc surveyfreq data=yourdata(where=(deplvl in (2,3));
/*same code as before */
run;

The biggest caveat here is that the weighting will not be the same as for the overall test (I said it was a brute force method, not an elegant one). The second biggest is that you would likely have to run the resulting p values through PROC MULTTEST to get multiplicity adjusted p values.

But there might be another way. Have you considered PROC SURVEYLOGISTIC? You could model all the data, then extract comparisons of interest using a variety of tools (LSMEANS, LSMESTIMATE, SLICE) which enable specific pairwise comparisons and adjust using an ADJUST= option. For SAS/STAT 15.2 Example 119.2 The Medical Expenditure Panel Survey (MEPS) might be an excellent starting point, given what I can make of your survey design (not really my field).

SteveDenham

View solution in original post

SteveDenham · Posted 10-15-2024 10:17 AM

Well, there is a "brute force" method. After running PROC SURVEYFREQ with all of the DEPLVL's included, use a WHERE= option on the DATA= part of the SURVEYFREQ statement.

You would then have 5 calls to SURVEYFREQ. The first would use this:

proc surveyfreq data=yourdata;
/*remaining code to get what you already have*/
run;

proc surveyfreq data=yourdata(where=(deplvl in (0,1));
/*same code as before */
run;
proc surveyfreq data=yourdata(where=(deplvl in (0,2));
/*same code as before */
run;
proc surveyfreq data=yourdata(where=(deplvl in (0,3));
/*same code as before */
run;
proc surveyfreq data=yourdata(where=(deplvl in (2,3));
/*same code as before */
run;

The biggest caveat here is that the weighting will not be the same as for the overall test (I said it was a brute force method, not an elegant one). The second biggest is that you would likely have to run the resulting p values through PROC MULTTEST to get multiplicity adjusted p values.

But there might be another way. Have you considered PROC SURVEYLOGISTIC? You could model all the data, then extract comparisons of interest using a variety of tools (LSMEANS, LSMESTIMATE, SLICE) which enable specific pairwise comparisons and adjust using an ADJUST= option. For SAS/STAT 15.2 Example 119.2 The Medical Expenditure Panel Survey (MEPS) might be an excellent starting point, given what I can make of your survey design (not really my field).

SteveDenham

scottyoung77 · Posted 10-15-2024 03:56 PM

Thanks for the quick reply - I appreciate it!

I have considered PROC SURVEYLOGISTIC, yeah, but I had been hoping to avoid the regression route. Not for any particularly good reason mind you - just because I've been having a hard time believing there isn't an obvious/simple way (that I was missing) to run pairwise comparisons following a chi-square test of independence. It seems like there ought to be, but like I said, statistics isn't my forte, so I could very well just be thinking something's reasonable when it very much isn't.

Sounds like PROC SURVEYLOGISTIC is the way to go. I like the brute force method, but I'm wary of the subgroup analysis effects. I'll give the regression route a try and see how it goes.

Thanks again,

Scott

scottyoung77 · Posted 10-17-2024 11:44 PM

One other question actually - if I account for the survey design elements in my subgroup analysis, does it mitigate the issues caused by selecting a subsample in the first place (e.g., undermining the weighting, the potential introduction of bias, etc.)? For example, my subgroup analysis would look something like this:

proc surveyfreq data=nesarc3.under65_composite(where=(deplvl in (0,1))) varmethod=taylor;
  weight audweight;
  strata varstrat;
  cluster varunit;
  tables NSEX*DEPLVL / chisq(secondorder) expected cellchi2 row col wchisq;
ods output CrossTabs=crosstabs_sex ChiSq=chisq_sex;
run;

I should note that I'm already working with a subsample in the sense that (before running any analyses) I'm excluding respondents 66+ (the overall sample is 18-90+) as well as anyone with a history of schizophrenia/psychotic illness from my sample. Those exclusions reduce the sample size by just over 17%.

SteveDenham · Posted 10-21-2024 12:27 PM

Hi Scott - You have exceeded my knowledge with this question. There should be a way of setting weights based on the first level of sampling from the whole population, and then scaling those for the subsamples, but I don't know the field well enough to point you to a good reference.

SteveDenham

scottyoung77 · Posted 10-21-2024 03:32 PM

Thanks Steve and no problem. I think I'm mostly just trying to wrap my head around the fundamentals here (I've exceeded my own knowledge with this question to be honest). I'm incorporating the weight, strata, and cluster variables in my proc surveyfreq's because the dataset instructions tell me to, but I'm not 100% sure what those variables are actually doing. I understand weighting, clustering, and stratification as concepts, but haven't really been able to wrap my head around what they do/how they function practically speaking (i.e., in SAS). Maybe I don't need to, but I do want to make sure I'm not overlooking something important.

Your earlier point about subsampling throwing off the weighting was a great one and was the perfect example of something I'm worried about overlooking (that hadn't even occurred to me). At first, it made me pause and think I shouldn't be trying to force pairwise comparison - that I should just let it go and go the regression route - but then I realized I'm basically subsampling already by opting to run all of my analyses on the 18-65 year old segment of the survey respondents. Meaning that if I'm tracking your earlier point, the weighting is probably already thrown off to some degree (assuming the survey's weighting strategy was to correlate the whole sample with known population data).

I suppose now all I'm really trying to do is understand how big of a deal that is/would be. If you have any thoughts on this front, I'd appreciate them. And if not, no worries - I appreciate all the help and clarity so far. It's been a real boon.

Pairwise comparison following a second-order Rao-Scott chi-square test of independence

Re: Pairwise comparison following a second-order Rao-Scott chi-square test of independence

Re: Pairwise comparison following a second-order Rao-Scott chi-square test of independence

Re: Pairwise comparison following a second-order Rao-Scott chi-square test of independence

Re: Pairwise comparison following a second-order Rao-Scott chi-square test of independence

Re: Pairwise comparison following a second-order Rao-Scott chi-square test of independence

Re: Pairwise comparison following a second-order Rao-Scott chi-square test of independence

Catch up on SAS Innovate 2026