BookmarkSubscribeRSS Feed
katy-barry
Calcite | Level 5

Hello,

I have more a theory-based question over a SAS coding question. I am working with a longitudinal dataset, and beforehand, I performed some descriptive statistics to look at differences that existed between my three exposure groups. Then, I used a DAG and previous literature to help me see what types of confounders could exist between my exposure and my outcome. 

my question: when building my final GEE model to use, I want to know which confounders I need to include in the model. Do i need to perform univariate GEE models and if the confounder has a p-value less than .2, it should be included? Or do I use the confounders that were statistically different between my exposure groups when I performed the descriptive statistics? Basically, I want to know how I should decide which variables to include and which variables to leave out in my finalized GEE model.

 

thank you for your help!

1 REPLY 1
SteveDenham
Jade | Level 19

Using p values to decide isn't a good idea for a lot of reasons.  See the literature out there regarding variable selection methods, especially this one from Peter Flom:

http://denversug.org/presentations/2010coday/stopsteppresntn.pdf  and this one from Peter Flom and David Cassell https://www.lexjansen.com/pnwsug/2008/DavidCassell-StoppingStepwise.pdf .  So given all of that, my favorite method of variable selection is to consider what question you wish to answer.  That will drive the selection with greater validity than p value dredging.  If after that you are still faced with too many variables, you may want to consider some sort of LASSO,  LAR or elastic net based selection.  Check out PROC GLMSELECT and PROC HPGENSELECT.  You would need to force your repeated measure into the model, but then select other variables based on the criterion chosen.  The results could be saved and then you could explore more fully in PROC GEE.

 

Just a suggestion - I don't want to say that this is the "best" method.

 

SteveDenham

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 478 views
  • 1 like
  • 2 in conversation