Using the national dataset, my study aim is to obtain cdi infection rate and mortality over 12-year period.While I am doing so, I saw significant differences in rates among crude, adjusted for age/sex, and adjusted for age/sex/comorbidities. Then I went ahead and obtained mortality for entire sample regardless of primary or secondary diagnoses. Below are the results. You can see almost 10x differences between 2nd (adjusted for age and sex) and 3rd (adjusted for age/sex/race/comorbidities) results.
crude adjusted for age and sex adj for demographics and comorbidities 2003 0.02202 0.01558 0.1463 2004 0.02096 0.01498 0.1376 2005 0.02043 0.01444 0.1294 2006 0.02018 0.01415 0.1234 2007 0.01916 0.01361 0.1149 2008 0.02014 0.01379 0.1174 2009 0.01893 0.01307 0.1084 2010 0.01862 0.01286 0.1034 2011 0.01872 0.01253 0.0968 2012 0.01845 0.01259 0.09664 2013 0.0189 0.01286 0.09651 2014 0.01901 0.01301 0.09573 My sas codes are below.Questions:1.I don’t know why numbers are significantly different. Maybe Poisson is inappropriate for obtaining adjusted rates, but appropriate for IRR in multivariable model…?2.What would be the best method(s) to perform trends analyses for rates and IRR…?Best,SunCrude mortality
proc genmod data=NIS.cdi;
class year female agegroup / param=glm;
model died (event='1')=year / type3 dist=poisson link=log offset=log_discharge;
weight trendwt;
store plmsourcenis;
run;
proc plm source=plmsourceni;
lsmeans year / ilink cl;run;
Mortality after adjusting for sex and age
proc genmod data=NIS.cdi;
class year female agegroup / param=glm;
model died (event='1')=year female agegroup / type3 dist=poisson link=log offset=log_discharge;
weight trendwt;
store plmsourcenis_sexage;
run;
proc plm source=plmsourcenis_sexage;
lsmeans year / ilink cl;run;
Mortality after adjusting for sex, age, and comorbidities
proc genmod data=NIS.cdi;
class year female agegroup race_three
CM_AIDS CM_alcohol CM_anemdef CM_arth CM_bldloss CM_CHF CM_chrnlung CM_coag
CM_depress CM_DM CM_dmcx CM_drug CM_HTN_C CM_hypothy CM_liver CM_lymph CM_lytes
CM_mets CM_neuro CM_obese CM_para CM_perivasc CM_psych CM_pulmcirc CM_renlfail
CM_tumor CM_ulcer CM_valve CM_wghtloss / param=glm;
model died (event='1')=year female agegroup race_three
CM_AIDS CM_alcohol CM_anemdef CM_arth CM_bldloss CM_CHF CM_chrnlung CM_coag
CM_depress CM_DM CM_dmcx CM_drug CM_HTN_C CM_hypothy CM_liver CM_lymph CM_lytes
CM_mets CM_neuro CM_obese CM_para CM_perivasc CM_psych CM_pulmcirc CM_renlfail
CM_tumor CM_ulcer CM_valve CM_wghtloss / type3 dist=poisson link=log offset=log_discharge;
weight trendwt;
store plmsourcenis_com;
run;
proc plm source=plmsourcenis_com;
lsmeans year / ilink cl;run;
Without the data, it is difficult to know how the response might be affected by the 'female' and 'agegroup' variables. However, in general, what you describe can when you add or exclude a categorical variable to a model. It is known as Simpson's Paradox and it means that the within-group relationships between variables are different from the between-group relationships. There are some pictures in the Wikipedia article that show why it occurs. You might try graphing your data in a similar fashion to see if it reveals whether Simpson's paradox is responsible for what you are seeing.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.