I am conducting a research on large database to assess attributable risk of having stroke among females compared to males.
Based on the provided protocol (shown below), I generated a code (shown below) based on a tutorial shown here.
My question is that I noticed that the obtained EXPECTED value among females is almost equal to the OBSERVED value in males.
Would this make sense. I have over a million observation. I noticed that the obtained odds ratio runs in parallel to the obtained O/E values which make me comfortable with the results (shown in the figure below) but I need to confirm from the experts. I repeated the same code in different years (subgroups) and got the same issue (obtained EXPECTED value among females is almost equal to the OBSERVED value in males).
Any help will be greatly appreciated.
libname in "C:\Users\mm\OneDrive\R codes including sts\Mitral women";
/********************************************************************************************/
%let varname = stsrcstroke ;
proc reg data=in.Sts_work_cabg_sorted_subset outest=RegOut noprint;
where include_Mo=1 & ifemale=0; /****/
YHat: model &varname. /*(event="1")*/ = age50
age60
age75
black
bmi1
bmi2
bsa1
bsa2
chrlungdmild
chrlungdmod
chrlungdsev
creat000
creat100
creat150
disvnumber
ef50
hct1
ialcohol2to7
ialcohol8plus
icardpresnonstemi
icardpresstabang
icardpresstemi
icardpresunstabang
ichfnotnyha4
ichfnyha4
icvdnotiacva
icvdpcarsurg
icvdstenosisdouble
icvdtia
idiabinsulin
idiaboral
idiabotherctrl
idialysis
ifemale
ifemalehct1
ihmo2
iimmsupp
iliverdis
ilmaindis
imedadp5days
imedadpidis
imedgp
imediastrad
imedinotr
imedster
insufaeq3
insufaeq4
insufmeq3
insufmeq4
insufteq3
insufteq4
intercept
ipayorge65comhmo
ipayorlt65mcaid
ipayorlt65mcare
ipayorlt65mcaremcaid
ipayorlt65selfnone
ipcigt6hr
ipcile6hr
ipciprior
iprcab
ipreopiabp
ipvd
irecentarrhythafibcont
irecentarrhythafibparox
irecentarrhyththb
irecentarrhythvfib
irecentarrhythvhbsss
irecentcvdcva
iremotearrhyth
iremotechf
iremotecvdcva
isyncope
iunrespstat
ivdstena
ivdstenm
mi1to21d
mi6to24hr
milt6hr
platelets1
platelets2
reop2
shockecmocba
statuseq2
statuseq3a
statusge3b
wbc1; quit;
proc score data=in.Sts_work_cabg_sorted_subset
score=RegOut type=parms predict out=Pred;
where ifemale=1;
data onlyFmls; set in.Sts_work_cabg_sorted_subset; if ifemale = 0 then delete; run;
proc score data=onlyFmls
score=RegOut type=parms predict out=Pred;
var age50
age60
age75
black
bmi1
bmi2
bsa1
bsa2
chrlungdmild
chrlungdmod
chrlungdsev
creat000
creat100
creat150
disvnumber
ef50
hct1
ialcohol2to7
ialcohol8plus
icardpresnonstemi
icardpresstabang
icardpresstemi
icardpresunstabang
ichfnotnyha4
ichfnyha4
icvdnotiacva
icvdpcarsurg
icvdstenosisdouble
icvdtia
idiabinsulin
idiaboral
idiabotherctrl
idialysis
ifemale
ifemalehct1
ihmo2
iimmsupp
iliverdis
ilmaindis
imedadp5days
imedadpidis
imedgp
imediastrad
imedinotr
imedster
insufaeq3
insufaeq4
insufmeq3
insufmeq4
insufteq3
insufteq4
intercept
ipayorge65comhmo
ipayorlt65mcaid
ipayorlt65mcare
ipayorlt65mcaremcaid
ipayorlt65selfnone
ipcigt6hr
ipcile6hr
ipciprior
iprcab
ipreopiabp
ipvd
irecentarrhythafibcont
irecentarrhythafibparox
irecentarrhyththb
irecentarrhythvfib
irecentarrhythvhbsss
irecentcvdcva
iremotearrhyth
iremotechf
iremotecvdcva
isyncope
iunrespstat
ivdstena
ivdstenm
mi1to21d
mi6to24hr
milt6hr
platelets1
platelets2
reop2
shockecmocba
statuseq2
statuseq3a
statusge3b
wbc1;run;
/**************************** Assess mean predicted value********************************************/;
proc means data= Pred mean ; var YHat ;run;
If all, or most, of your variables whose names start with i are dummy coded 1/0 for is or is not, then perhaps proc reg is not the place to model at the beginning. You may be looking for more of a LOGISTIC, GLM or other model that uses class variables. Proc Reg really expects continuous variables in the models.
I also see in your Proc reg code that you seem to be restricting the input data to males(?)
where include_Mo=1 & ifemale=0;
Which with ifemale in the model statement may mean you are interpretting the data in an interesting fashion if you say the "expected for females" is the same as observed for males when all the data is from males.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.