Solved: Re: PROC GEE Year Variable Coding Affects P-Values and Estimates

Sara_p-value · Posted 02-11-2025 09:30 PM

I'm running a PROC GEE model in SAS with a multinomial outcome and using year as a predictor.

PROC GEE DATA=data; CLASS id outcome(ref="2"); MODEL outcome = year / DIST=MULT LINK=GLOGIT; REPEATED SUBJECT=id/ TYPE=IND;

RUN;

When I use year as 2001, 2002, 2003, etc., the model does not provide p-values and Z statistics for year but I get estimates.

However, when I recode year as 0, 1, 2, etc., firstly the estimates change, and now I also get p-values and Z statistics.

I expected that only the intercept would shift, but why does this affect significance testing? Is there something about scale or estimation in GEE that I should consider?

Ksharp · Posted 02-11-2025 10:38 PM

1)Firstly, If you could post some data to replicate your problem, that would be very helpful to address where is your problem.

2) I think you should code like this:

CLASS id outcome(ref="2"); 
MODEL outcome = year / DIST=MULT LINK=GLOGIT; 
----->
CLASS id ; 
MODEL outcome(ref="2") = year / DIST=MULT LINK=GLOGIT;

3)About your question,that was supposed to be.

Your YEAR variable is a continous varibable ,NOT category variable,

so your YEAR variable is way too big like:2002 2003, the estimate statistic from H0 is way too big,that lead to be unable to calculate P-value (the statistic is out of range of distribution).

Take an example:

data have;
set sashelp.heart;
/*ageatstart=ageatstart+20;*/
if mod(_n_,500)=1 then id+1;
run;


PROC GEE DATA=have;
CLASS id ;
MODEL bp_Status(ref='High') = ageatstart / DIST=MULT LINK=GLOGIT; 
REPEATED SUBJECT=id/ TYPE=IND; 
RUN;

You could take Z and P value.

data have;
set sashelp.heart;
ageatstart=ageatstart+200;  /*is too big*/
if mod(_n_,500)=1 then id+1;
run;


PROC GEE DATA=have;
CLASS id ;
MODEL bp_Status(ref='High') = ageatstart / DIST=MULT LINK=GLOGIT; 
REPEATED SUBJECT=id/ TYPE=IND; 
RUN;

Ageatstart is way too big to get Z and P value.

data have;
set sashelp.heart;
ageatstart=ageatstart+20;  /*is NOT too big*/
if mod(_n_,500)=1 then id+1;
run;


PROC GEE DATA=have;
CLASS id ;
MODEL bp_Status(ref='High') = ageatstart / DIST=MULT LINK=GLOGIT; 
REPEATED SUBJECT=id/ TYPE=IND; 
RUN;

Ageatstart is not too big ,so you can get Z and P value.

NOTE: Z and P value is the same with the first one.

If you take Ageatstart as a category variable, you could get both of them whether it is 2002 or 2. and remain the same result.

data have;
set sashelp.heart;
/*ageatstart=ageatstart+200;  */
if mod(_n_,500)=1 then id+1;
run;


PROC GEE DATA=have;
CLASS id ageatstart;
MODEL bp_Status(ref='High') = ageatstart / DIST=MULT LINK=GLOGIT; 
REPEATED SUBJECT=id/ TYPE=IND; 
RUN;

data have;
set sashelp.heart;
ageatstart=ageatstart+200;  
if mod(_n_,500)=1 then id+1;
run;


PROC GEE DATA=have;
CLASS id ageatstart;
MODEL bp_Status(ref='High') = ageatstart / DIST=MULT LINK=GLOGIT; 
REPEATED SUBJECT=id/ TYPE=IND; 
RUN;

View solution in original post

Ksharp · Posted 02-11-2025 10:38 PM

1)Firstly, If you could post some data to replicate your problem, that would be very helpful to address where is your problem.

2) I think you should code like this:

CLASS id outcome(ref="2"); 
MODEL outcome = year / DIST=MULT LINK=GLOGIT; 
----->
CLASS id ; 
MODEL outcome(ref="2") = year / DIST=MULT LINK=GLOGIT;

3)About your question,that was supposed to be.

Your YEAR variable is a continous varibable ,NOT category variable,

so your YEAR variable is way too big like:2002 2003, the estimate statistic from H0 is way too big,that lead to be unable to calculate P-value (the statistic is out of range of distribution).

Take an example:

data have;
set sashelp.heart;
/*ageatstart=ageatstart+20;*/
if mod(_n_,500)=1 then id+1;
run;


PROC GEE DATA=have;
CLASS id ;
MODEL bp_Status(ref='High') = ageatstart / DIST=MULT LINK=GLOGIT; 
REPEATED SUBJECT=id/ TYPE=IND; 
RUN;

You could take Z and P value.

data have;
set sashelp.heart;
ageatstart=ageatstart+200;  /*is too big*/
if mod(_n_,500)=1 then id+1;
run;


PROC GEE DATA=have;
CLASS id ;
MODEL bp_Status(ref='High') = ageatstart / DIST=MULT LINK=GLOGIT; 
REPEATED SUBJECT=id/ TYPE=IND; 
RUN;

Ageatstart is way too big to get Z and P value.

data have;
set sashelp.heart;
ageatstart=ageatstart+20;  /*is NOT too big*/
if mod(_n_,500)=1 then id+1;
run;


PROC GEE DATA=have;
CLASS id ;
MODEL bp_Status(ref='High') = ageatstart / DIST=MULT LINK=GLOGIT; 
REPEATED SUBJECT=id/ TYPE=IND; 
RUN;

Ageatstart is not too big ,so you can get Z and P value.

NOTE: Z and P value is the same with the first one.

If you take Ageatstart as a category variable, you could get both of them whether it is 2002 or 2. and remain the same result.

data have;
set sashelp.heart;
/*ageatstart=ageatstart+200;  */
if mod(_n_,500)=1 then id+1;
run;


PROC GEE DATA=have;
CLASS id ageatstart;
MODEL bp_Status(ref='High') = ageatstart / DIST=MULT LINK=GLOGIT; 
REPEATED SUBJECT=id/ TYPE=IND; 
RUN;

data have;
set sashelp.heart;
ageatstart=ageatstart+200;  
if mod(_n_,500)=1 then id+1;
run;


PROC GEE DATA=have;
CLASS id ageatstart;
MODEL bp_Status(ref='High') = ageatstart / DIST=MULT LINK=GLOGIT; 
REPEATED SUBJECT=id/ TYPE=IND; 
RUN;

Sara_p-value · Posted 02-12-2025 09:11 AM

I should have included my data structure in my original post - I'll make sure to do that next time.

Thank you for your detailed and helpful response! Your explanation about the numerical computation issues with large year values really clarified my problem. I assume that it would make more sense to use 0, 1, 2, 3 as year values then.

PROC GEE Year Variable Coding Affects P-Values and Estimates

Re: PROC GEE Year Variable Coding Affects P-Values and Estimates

Re: PROC GEE Year Variable Coding Affects P-Values and Estimates

Re: PROC GEE Year Variable Coding Affects P-Values and Estimates

The 2025 SAS Hackathon has begun!