I'm running a PROC GEE model in SAS with a multinomial outcome and using year
as a predictor.
PROC GEE DATA=data; CLASS id outcome(ref="2"); MODEL outcome = year / DIST=MULT LINK=GLOGIT; REPEATED SUBJECT=id/ TYPE=IND;
RUN;
When I use year as 2001, 2002, 2003, etc., the model does not provide p-values and Z statistics for year
but I get estimates.
However, when I recode year
as 0, 1, 2, etc., firstly the estimates change, and now I also get p-values and Z statistics.
I expected that only the intercept would shift, but why does this affect significance testing? Is there something about scale or estimation in GEE that I should consider?
1)Firstly, If you could post some data to replicate your problem, that would be very helpful to address where is your problem.
2) I think you should code like this:
CLASS id outcome(ref="2"); MODEL outcome = year / DIST=MULT LINK=GLOGIT; -----> CLASS id ; MODEL outcome(ref="2") = year / DIST=MULT LINK=GLOGIT;
3)About your question,that was supposed to be.
Your YEAR variable is a continous varibable ,NOT category variable,
so your YEAR variable is way too big like:2002 2003, the estimate statistic from H0 is way too big,that lead to be unable to calculate P-value (the statistic is out of range of distribution).
Take an example:
data have; set sashelp.heart; /*ageatstart=ageatstart+20;*/ if mod(_n_,500)=1 then id+1; run; PROC GEE DATA=have; CLASS id ; MODEL bp_Status(ref='High') = ageatstart / DIST=MULT LINK=GLOGIT; REPEATED SUBJECT=id/ TYPE=IND; RUN;
You could take Z and P value.
data have; set sashelp.heart; ageatstart=ageatstart+200; /*is too big*/ if mod(_n_,500)=1 then id+1; run; PROC GEE DATA=have; CLASS id ; MODEL bp_Status(ref='High') = ageatstart / DIST=MULT LINK=GLOGIT; REPEATED SUBJECT=id/ TYPE=IND; RUN;
Ageatstart is way too big to get Z and P value.
data have; set sashelp.heart; ageatstart=ageatstart+20; /*is NOT too big*/ if mod(_n_,500)=1 then id+1; run; PROC GEE DATA=have; CLASS id ; MODEL bp_Status(ref='High') = ageatstart / DIST=MULT LINK=GLOGIT; REPEATED SUBJECT=id/ TYPE=IND; RUN;
Ageatstart is not too big ,so you can get Z and P value.
NOTE: Z and P value is the same with the first one.
If you take Ageatstart as a category variable, you could get both of them whether it is 2002 or 2. and remain the same result.
data have; set sashelp.heart; /*ageatstart=ageatstart+200; */ if mod(_n_,500)=1 then id+1; run; PROC GEE DATA=have; CLASS id ageatstart; MODEL bp_Status(ref='High') = ageatstart / DIST=MULT LINK=GLOGIT; REPEATED SUBJECT=id/ TYPE=IND; RUN;
data have; set sashelp.heart; ageatstart=ageatstart+200; if mod(_n_,500)=1 then id+1; run; PROC GEE DATA=have; CLASS id ageatstart; MODEL bp_Status(ref='High') = ageatstart / DIST=MULT LINK=GLOGIT; REPEATED SUBJECT=id/ TYPE=IND; RUN;
1)Firstly, If you could post some data to replicate your problem, that would be very helpful to address where is your problem.
2) I think you should code like this:
CLASS id outcome(ref="2"); MODEL outcome = year / DIST=MULT LINK=GLOGIT; -----> CLASS id ; MODEL outcome(ref="2") = year / DIST=MULT LINK=GLOGIT;
3)About your question,that was supposed to be.
Your YEAR variable is a continous varibable ,NOT category variable,
so your YEAR variable is way too big like:2002 2003, the estimate statistic from H0 is way too big,that lead to be unable to calculate P-value (the statistic is out of range of distribution).
Take an example:
data have; set sashelp.heart; /*ageatstart=ageatstart+20;*/ if mod(_n_,500)=1 then id+1; run; PROC GEE DATA=have; CLASS id ; MODEL bp_Status(ref='High') = ageatstart / DIST=MULT LINK=GLOGIT; REPEATED SUBJECT=id/ TYPE=IND; RUN;
You could take Z and P value.
data have; set sashelp.heart; ageatstart=ageatstart+200; /*is too big*/ if mod(_n_,500)=1 then id+1; run; PROC GEE DATA=have; CLASS id ; MODEL bp_Status(ref='High') = ageatstart / DIST=MULT LINK=GLOGIT; REPEATED SUBJECT=id/ TYPE=IND; RUN;
Ageatstart is way too big to get Z and P value.
data have; set sashelp.heart; ageatstart=ageatstart+20; /*is NOT too big*/ if mod(_n_,500)=1 then id+1; run; PROC GEE DATA=have; CLASS id ; MODEL bp_Status(ref='High') = ageatstart / DIST=MULT LINK=GLOGIT; REPEATED SUBJECT=id/ TYPE=IND; RUN;
Ageatstart is not too big ,so you can get Z and P value.
NOTE: Z and P value is the same with the first one.
If you take Ageatstart as a category variable, you could get both of them whether it is 2002 or 2. and remain the same result.
data have; set sashelp.heart; /*ageatstart=ageatstart+200; */ if mod(_n_,500)=1 then id+1; run; PROC GEE DATA=have; CLASS id ageatstart; MODEL bp_Status(ref='High') = ageatstart / DIST=MULT LINK=GLOGIT; REPEATED SUBJECT=id/ TYPE=IND; RUN;
data have; set sashelp.heart; ageatstart=ageatstart+200; if mod(_n_,500)=1 then id+1; run; PROC GEE DATA=have; CLASS id ageatstart; MODEL bp_Status(ref='High') = ageatstart / DIST=MULT LINK=GLOGIT; REPEATED SUBJECT=id/ TYPE=IND; RUN;
I should have included my data structure in my original post - I'll make sure to do that next time.
Thank you for your detailed and helpful response! Your explanation about the numerical computation issues with large year values really clarified my problem. I assume that it would make more sense to use 0, 1, 2, 3 as year values then.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.