Re: Case-Control Analysis with SURVEYLOGISTIC?

MichaelTH · Posted 12-18-2025 10:36 PM

Hello,

I'd like to conduct a matched case-control analysis of data from the National Health Interview Survey (NHIS). I've previously used PROC SURVEYLOGISTIC for other analyses of NHIS data, controlling for the complex survey design (specifying the STRATA, CLUSTER, and WEIGHT provided with the survey data for each respondent). I'm unclear how (or if) SURVEYLOGISTIC can be used to analyze matched cases and controls; for data not from complex surveys, the STRATA statement in PROC LOGISTIC can be used to identify matched cases and control. Is there a way to use PROC SURVEYLOGISTIC for case-control analyses using complex survey data? Thank you!

Michael

Ksharp · Posted 12-19-2025 03:06 AM

Also could check PROC MIXED or PROC GLIMMIX .

MichaelTH · Posted 12-23-2025 01:39 PM

Thank you for the suggestions. From a couple of online SAS notes (http://www.asasrms.org/Proceedings/y2009/Files/304072.pdf; https://www.lexjansen.com/wuss/2010/HOC/2930_2_HOR-Short.pdf), I came up with this PROC GLIMMIX code:

proc GLIMMIX data=survey_data;
class PSTRAT PPSU predictor_vars ;
model outcome=predictor_vars /dist=binary link=logit solution;
random PPSU(PSTRAT) ;
random int/subject=study_id;
weight WTFA_A;

where PSTRAT is the stratum

PPSU is the cluster (PSU)

WTFA_A is the observation-level weight

study_id is the matching variable linking case and control observations

outcome is the binary outcome for the logistic regression analysis

response_vars represents the response variables used in the logistic regression analysis

Any comments on this? Alternatively, I received a suggestion to use PROC SURVEYLOGISTIC for this analysis, controlling for the complex survey design using the STRATA, CLUSTER, and WEIGHT statements and adjusting for the matched cases and controls using within-set centering. That is, for each matched set of case and control observations, compute the within-set mean for each predictor variable and use the deviation from the set mean as the predictor variables in the logistic regression analyses. So, for each predictor variable pred1 that has a within-set mean of pred1_mean, compute the deviation from the set mean

pred1_dev = pred1 - pred1_mean;

and use this in the SURVEYLOGISTIC MODEL statement

PROC SURVEYLOGISTIC data=survey_data;
STRATA PSTRAT;
CLUSTER PPSU;
WEIGHT WTFA_A;
model outcome (event='1') = pred1_dev;

Is this a reasonable approach? Thank you very much.

Case-Control Analysis with SURVEYLOGISTIC?

Re: Case-Control Analysis with SURVEYLOGISTIC?

Re: Case-Control Analysis with SURVEYLOGISTIC?

SAS Innovate 2026 Registration is Open