BookmarkSubscribeRSS Feed
mrahouma
Obsidian | Level 7

I am conducting a research on large database to assess attributable risk of having stroke among females compared to males.

Based on the provided protocol (shown below), I generated a code (shown below) based on a tutorial shown here.

image.png

My question is that I noticed that the obtained EXPECTED value among females is almost equal to the OBSERVED value in males. 

Would this make sense. I have over a million observation. I noticed that the obtained odds ratio runs in parallel to the obtained O/E values which make me comfortable with the results (shown in the figure below) but I need to confirm from the experts. I repeated the same code in different years (subgroups) and got the same issue (obtained EXPECTED value among females is almost equal to the OBSERVED value in males).

Any help will be greatly appreciated.

image.png

 

 

 

libname in "C:\Users\mm\OneDrive\R codes including sts\Mitral women";
/********************************************************************************************/
%let varname = stsrcstroke ; 
proc reg data=in.Sts_work_cabg_sorted_subset outest=RegOut noprint;
where include_Mo=1 & ifemale=0; /****/
YHat: model &varname. /*(event="1")*/ = age50
age60
age75
black
bmi1
bmi2
bsa1
bsa2
chrlungdmild
chrlungdmod
chrlungdsev
creat000
creat100
creat150
disvnumber
ef50
hct1
ialcohol2to7
ialcohol8plus
icardpresnonstemi
icardpresstabang
icardpresstemi
icardpresunstabang
ichfnotnyha4
ichfnyha4
icvdnotiacva
icvdpcarsurg
icvdstenosisdouble
icvdtia
idiabinsulin
idiaboral
idiabotherctrl
idialysis
ifemale
ifemalehct1
ihmo2
iimmsupp
iliverdis
ilmaindis
imedadp5days
imedadpidis
imedgp
imediastrad
imedinotr
imedster
insufaeq3
insufaeq4
insufmeq3
insufmeq4
insufteq3
insufteq4
intercept
ipayorge65comhmo
ipayorlt65mcaid
ipayorlt65mcare
ipayorlt65mcaremcaid
ipayorlt65selfnone
ipcigt6hr
ipcile6hr
ipciprior
iprcab
ipreopiabp
ipvd
irecentarrhythafibcont
irecentarrhythafibparox
irecentarrhyththb
irecentarrhythvfib
irecentarrhythvhbsss
irecentcvdcva
iremotearrhyth
iremotechf
iremotecvdcva
isyncope
iunrespstat
ivdstena
ivdstenm
mi1to21d
mi6to24hr
milt6hr
platelets1
platelets2
reop2
shockecmocba
statuseq2
statuseq3a
statusge3b
wbc1; quit;   

proc score data=in.Sts_work_cabg_sorted_subset
score=RegOut type=parms predict out=Pred;
where ifemale=1;  

data onlyFmls;   set  in.Sts_work_cabg_sorted_subset; if ifemale = 0 then delete; run;

proc score data=onlyFmls
score=RegOut type=parms predict out=Pred;
var age50
age60
age75
black
bmi1
bmi2
bsa1
bsa2
chrlungdmild
chrlungdmod
chrlungdsev
creat000
creat100
creat150
disvnumber
ef50
hct1
ialcohol2to7
ialcohol8plus
icardpresnonstemi
icardpresstabang
icardpresstemi
icardpresunstabang
ichfnotnyha4
ichfnyha4
icvdnotiacva
icvdpcarsurg
icvdstenosisdouble
icvdtia
idiabinsulin
idiaboral
idiabotherctrl
idialysis
ifemale
ifemalehct1
ihmo2
iimmsupp
iliverdis
ilmaindis
imedadp5days
imedadpidis
imedgp
imediastrad
imedinotr
imedster
insufaeq3
insufaeq4
insufmeq3
insufmeq4
insufteq3
insufteq4
intercept
ipayorge65comhmo
ipayorlt65mcaid
ipayorlt65mcare
ipayorlt65mcaremcaid
ipayorlt65selfnone
ipcigt6hr
ipcile6hr
ipciprior
iprcab
ipreopiabp
ipvd
irecentarrhythafibcont
irecentarrhythafibparox
irecentarrhyththb
irecentarrhythvfib
irecentarrhythvhbsss
irecentcvdcva
iremotearrhyth
iremotechf
iremotecvdcva
isyncope
iunrespstat
ivdstena
ivdstenm
mi1to21d
mi6to24hr
milt6hr
platelets1
platelets2
reop2
shockecmocba
statuseq2
statuseq3a
statusge3b
wbc1;run;

/**************************** Assess mean predicted value********************************************/;
proc means data= Pred mean  ;  var  YHat ;run;

 

2 REPLIES 2
ballardw
Super User

If all, or most, of your variables whose names start with i are dummy coded 1/0 for is or is not, then perhaps proc reg is not the place to model at the beginning. You may be looking for more of a LOGISTIC, GLM or other model that uses class variables. Proc Reg really expects continuous variables in the models.

 

I also see in your Proc reg code that you seem to be restricting the input data to males(?)

where include_Mo=1 & ifemale=0;

Which with ifemale in the model statement may mean you are interpretting the data in an interesting fashion if you say the "expected for females" is the same as observed for males when all the data is from males.

 

mrahouma
Obsidian | Level 7
Thanks for reviewing my post. As per the provided protocol, I estimated the risk in MALES only using multivariate logistic regression model and used the output to score the FEMALES only data. i.e. why I used ```where include_Mo=1 & ifemale=0;``` in the 1st model. Appreciate all your precious input.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 369 views
  • 0 likes
  • 2 in conversation