Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Risk ratios & risk differences in correlated data w/ logistic model an...

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 07-08-2019 12:09 PM
(1228 views)

Hello and thank you for reviewing my question. I am currently trying to use SAS to conduct a g-computation analysis, estimating the effect of statin initiation (exposure variable: StatinInitiator) on an all-cause mortality outcome (outcome variable:status_DEATH). I am not permitted to share any data but to describe it, this is patient-level data where each row in the dataset refers to a distinct patient-level observation containing information on exposure, outcome, and baseline covariates. I am trying to figure out the correct models and SAS procedures to use for modeling my outcome and would appreciate any help I can get. First some background - as part of the g-computation method, we model the outcome as a function of each subject's exposure and covariate distributions, which we have observed for all subjects. Then we use these models to estimate each subject's outcome probability under both exposures.

I have been able to make this work by fitting GEE models using PROC GENMOD (which I was using with the REPEATED SUBJECTS= statement because in my study subjects can appear in both exposure groups). However, I am having some issues with quasi-complete non-convergence, which led me to change the DIST=BINOMIAL to DIST=POISSON. Since this allows probabilities >1 I have recently been advised by a mentor to switch back to DIST=BINOMIAL. To address the persistent non-convergence issues, I was also advised to use Firth's bias correction. However, my understanding is that the only SAS procedure that can implement Firth's bias correction is PROC LOGISTIC (FIRTH option in the MODEL statement). However, I am now unclear how to account for the correlated observations since PROC LOGISTIC has no REPEATED SUBJECTS= statement. Can anyone provide guidance regarding what SAS procedure I can use to implement a logistic model with Firth's bias correction which properly accounts for the correlated observations?

Below I have included my PROC GENMOD code. Please let me know if I can clarify anything above or address any questions that would make my question more clear. *Please note that I am also trying use this method to calculate risk ratios and risk differences so I have some macro language in my PROC GENMOD code which would toggle the settings necessary for each estimate.*

Thank you for any guidance you can provide. In case it is relevant, I am using SAS version 9.4.

**CODE:**

ods listing exclude all;

ods output

GEEEmpPEst = paramDS;

proc genmod data= input_dataset descending;

weight weight_var;

class bene_id

AGECAT (PARAM=REF REF="2")

GENDER (PARAM=REF REF="0")

YEAR (PARAM=REF REF="2011")

RACE (PARAM=REF REF="1")

OUTPTVISIT_1yr_cat (PARAM=REF REF="5")

SNF_1yr_cat (PARAM=REF REF="0")

HS_1yr_cat (PARAM=REF REF="0")

UniqueDrugs_1yr_cat (PARAM=REF REF="5")

ldl_1yr (PARAM=REF REF="<100")

sbp_1yr (PARAM=REF REF="<130")

dbp_1yr (PARAM=REF REF="<80");

MODEL status_DEATH = StatinInitiator

AGECAT

YEAR RACE

OUTPTVISIT_1yr_cat

SNF_1yr_cat

HS_1yr_cat

UniqueDrugs_1yr_cat

/*Continuous variables*/

AGEyrs Age_sq

SNF_1yr

UniqueDrugs_1yr

HS_1yr

OUTPTVISIT_1yr

/*Binary variables*/

AFIB_1yr

AMBLIFESUPPORT_1yr

ANEMIA_1yr

ANGIOGRAPHY_1yr

ARB_1yr

ASTHMA_1yr

CANCERSCREEN_1yr

CKD_1yr

COLONOSCOPY_1yr

COPD_1yr

DEMENTIA_1yr

DIURETICS_1yr

ECHOCARDIOGRAPH_1yr

FECALOCCULT_1yr

GENDER

HOMEOXYGEN_1yr

HSCRP_1yr

HYPERLIPIDEMIA_1yr

INCL_ENDARTERECTOMY

INCL_STROKE

INFLAMBOWEL_1yr

INSULIN_1yr

LIPIDPANEL_1yr

OBESITY_1yr

OSTEOARTHRITIS_1yr

PARALYSIS_1yr

PCD_1yr

PSYCHIATRIC_1yr

PVD_1yr

SEPSIS_1yr

SMOKING_1yr

STRESSTEST_1yr

SUBABUSE_1yr

SULFONYLUREA_1yr

THIAZIDE_1yr

VERTIGO_1yr

VTE_1yr

WEAKNESS_1yr

WHEELCHAIR_1yr

/ link= %IF &measure=RR %THEN logit; %ELSE %IF &measure=RD %THEN identity;

dist= poisson maxiter=250;

repeated subject=bene_id / type=ind;

output out=out_data(keep=StatinInitiator bene_id gender age_bin probability status_DEATH weight;)

prob=probability;

run;

3 REPLIES 3

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thanks for the response StatDave_sas and for considering my question. I just wanted to clarify a few points.

I realize that I have a lot of model predictors but I should note that I have quite a lot of data with a substantial number of outcomes. I do understand that a model with fewer predictors will be more likely to converge and that is certainly a solution I plan to explore further. However, the person who was advising me seemed to think Firth's bias correction may resolve the problem before it was necessary to start eliminating predictors from the model. Perhaps they were mistaken in that understanding? Furthermore, my study design requires me to select more than one observation from each subject (i.e. one observation per subject per exposure level) for reasons I won't get into here. Thus, I'm not sure I feel comfortable selecting only one observation per subject or proceeding with a model that doesn't somehow account for this correlation when estimating the variance.

However, so far I haven't found any solution which will allow me to implement Firth's bias correction (which I believe can only be implemented as an option in the PROC LOGISTIC MODEL statement) while also accounting for the the correlated observations (since PROC LOGISTIC doesn't have a REPEATED SUBJECTS= option). -- I may be misunderstanding this so if anyone else has any ideas or recommendations let me know. -- In absence of such a solution, I will try your recommended approaches for reducing the number of model predictors. Thanks again for your helpful advice!

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Firth's method involves applying a penalty to the likelihood. Since GEE is not a likelihood-based method, Firth's method is not possible.

Even with a lot of data, sparseness can easily occur when no responses of one type appear in one particular cross-classification of all of the predictors.

The idea of using one observation per subject was just a way to use a model selection process in PROC LOGISTIC or PROC HPGENSELECT to discover which predictors might be the most important ones. With that info you could fit the GEE model using the relatively few important predictors.

Are you ready for the spotlight? We're accepting content ideas for **SAS Innovate 2025** to be held May 6-9 in Orlando, FL. The call is **open **until September 25. Read more here about **why** you should contribute and **what is in it** for you!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.