BookmarkSubscribeRSS Feed
MichaelTH
Calcite | Level 5

Hello,

 

I'd like to conduct a matched case-control analysis of data from the National Health Interview Survey (NHIS). I've previously used PROC SURVEYLOGISTIC for other analyses of NHIS data, controlling for the complex survey design (specifying the STRATA, CLUSTER, and WEIGHT provided with the survey data for each respondent). I'm unclear how (or if) SURVEYLOGISTIC can be used to analyze matched cases and controls; for data not from complex surveys, the STRATA statement in PROC LOGISTIC can be used to identify matched cases and control. Is there a way to use PROC SURVEYLOGISTIC for case-control analyses using complex survey data?  Thank you!

 

Michael

 

2 REPLIES 2
Ksharp
Super User
Also could check PROC MIXED or PROC GLIMMIX .
MichaelTH
Calcite | Level 5

Thank you for the suggestions. From a couple of online SAS notes (http://www.asasrms.org/Proceedings/y2009/Files/304072.pdf; https://www.lexjansen.com/wuss/2010/HOC/2930_2_HOR-Short.pdf), I came up with this PROC GLIMMIX code:

 

proc GLIMMIX data=survey_data;
class PSTRAT PPSU predictor_vars ;
model outcome=predictor_vars /dist=binary link=logit solution;
random PPSU(PSTRAT) ;
random int/subject=study_id;
weight WTFA_A;

 

where PSTRAT is the stratum 

PPSU is the cluster (PSU) 

WTFA_A is the observation-level weight 

study_id is the matching variable linking case and control observations

outcome is the binary outcome for the logistic regression analysis

response_vars represents the response variables used in the logistic regression analysis

 

Any comments on this?  Alternatively, I received a suggestion to use PROC SURVEYLOGISTIC for this analysis, controlling for the complex survey design using the STRATA, CLUSTER, and WEIGHT statements and adjusting for the matched cases and controls using within-set centering. That is, for each matched set of case and control observations, compute the within-set mean for each predictor variable and use the deviation from the set mean as the predictor variables in the logistic regression analyses. So, for each predictor variable pred1 that has a within-set mean of pred1_mean, compute the deviation from the set mean

   pred1_dev = pred1 - pred1_mean;

and use this in the SURVEYLOGISTIC MODEL statement

   

PROC SURVEYLOGISTIC data=survey_data;
STRATA PSTRAT;
CLUSTER PPSU;
WEIGHT WTFA_A;
model outcome (event='1') = pred1_dev;

 

Is this a reasonable approach?  Thank you very much.

 

 

 

 

sas-innovate-2026-white.png



April 27 – 30 | Gaylord Texan | Grapevine, Texas

Registration is open

Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!

Register now

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 308 views
  • 0 likes
  • 2 in conversation