BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Sara_p-value
Fluorite | Level 6

 

I'm running a PROC GEE model in SAS with a multinomial outcome  and using year as a predictor.

 

PROC GEE DATA=data; CLASS id outcome(ref="2"); MODEL outcome = year / DIST=MULT LINK=GLOGIT; REPEATED SUBJECT=id/ TYPE=IND; 
RUN;

When I use year as 2001, 2002, 2003, etc., the model does not provide p-values and Z statistics for year but I get estimates.

However, when I recode year as 0, 1, 2, etc., firstly the estimates change, and now I also get p-values and Z statistics.

I expected that only the intercept would shift, but why does this affect significance testing? Is there something about scale or estimation in GEE that I should consider?

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

1)Firstly,  If you could post some data to replicate your problem, that would be very helpful to address where is your problem.

 

 

2) I think you should code like this:

CLASS id outcome(ref="2"); 
MODEL outcome = year / DIST=MULT LINK=GLOGIT; 
----->
CLASS id ; 
MODEL outcome(ref="2") = year / DIST=MULT LINK=GLOGIT; 

3)About your question,that was supposed to be.

Your YEAR variable is a continous varibable ,NOT category variable,

so your YEAR variable is way too big like:2002 2003, the estimate statistic from H0 is way too big,that lead to be unable to calculate P-value (the statistic is out of range of distribution).

Take an example:

data have;
set sashelp.heart;
/*ageatstart=ageatstart+20;*/
if mod(_n_,500)=1 then id+1;
run;


PROC GEE DATA=have;
CLASS id ;
MODEL bp_Status(ref='High') = ageatstart / DIST=MULT LINK=GLOGIT; 
REPEATED SUBJECT=id/ TYPE=IND; 
RUN;

Ksharp_0-1739331015910.png

 

You could take Z and P value.

 

 

 

 

 

data have;
set sashelp.heart;
ageatstart=ageatstart+200;  /*is too big*/
if mod(_n_,500)=1 then id+1;
run;


PROC GEE DATA=have;
CLASS id ;
MODEL bp_Status(ref='High') = ageatstart / DIST=MULT LINK=GLOGIT; 
REPEATED SUBJECT=id/ TYPE=IND; 
RUN;

Ksharp_1-1739331097653.png

Ageatstart is way too big to get Z and P value.

 

 

 

 

 

data have;
set sashelp.heart;
ageatstart=ageatstart+20;  /*is NOT too big*/
if mod(_n_,500)=1 then id+1;
run;


PROC GEE DATA=have;
CLASS id ;
MODEL bp_Status(ref='High') = ageatstart / DIST=MULT LINK=GLOGIT; 
REPEATED SUBJECT=id/ TYPE=IND; 
RUN;

Ksharp_2-1739331206987.png

Ageatstart is not too big ,so you can get Z and P value.

NOTE: Z and P value is the same with the first one.

 

 

 

 

 

If you take Ageatstart as a category variable, you could get both of them whether it is 2002 or 2. and remain the same result.

data have;
set sashelp.heart;
/*ageatstart=ageatstart+200;  */
if mod(_n_,500)=1 then id+1;
run;


PROC GEE DATA=have;
CLASS id ageatstart;
MODEL bp_Status(ref='High') = ageatstart / DIST=MULT LINK=GLOGIT; 
REPEATED SUBJECT=id/ TYPE=IND; 
RUN;
data have;
set sashelp.heart;
ageatstart=ageatstart+200;  
if mod(_n_,500)=1 then id+1;
run;


PROC GEE DATA=have;
CLASS id ageatstart;
MODEL bp_Status(ref='High') = ageatstart / DIST=MULT LINK=GLOGIT; 
REPEATED SUBJECT=id/ TYPE=IND; 
RUN;

 

View solution in original post

2 REPLIES 2
Ksharp
Super User

1)Firstly,  If you could post some data to replicate your problem, that would be very helpful to address where is your problem.

 

 

2) I think you should code like this:

CLASS id outcome(ref="2"); 
MODEL outcome = year / DIST=MULT LINK=GLOGIT; 
----->
CLASS id ; 
MODEL outcome(ref="2") = year / DIST=MULT LINK=GLOGIT; 

3)About your question,that was supposed to be.

Your YEAR variable is a continous varibable ,NOT category variable,

so your YEAR variable is way too big like:2002 2003, the estimate statistic from H0 is way too big,that lead to be unable to calculate P-value (the statistic is out of range of distribution).

Take an example:

data have;
set sashelp.heart;
/*ageatstart=ageatstart+20;*/
if mod(_n_,500)=1 then id+1;
run;


PROC GEE DATA=have;
CLASS id ;
MODEL bp_Status(ref='High') = ageatstart / DIST=MULT LINK=GLOGIT; 
REPEATED SUBJECT=id/ TYPE=IND; 
RUN;

Ksharp_0-1739331015910.png

 

You could take Z and P value.

 

 

 

 

 

data have;
set sashelp.heart;
ageatstart=ageatstart+200;  /*is too big*/
if mod(_n_,500)=1 then id+1;
run;


PROC GEE DATA=have;
CLASS id ;
MODEL bp_Status(ref='High') = ageatstart / DIST=MULT LINK=GLOGIT; 
REPEATED SUBJECT=id/ TYPE=IND; 
RUN;

Ksharp_1-1739331097653.png

Ageatstart is way too big to get Z and P value.

 

 

 

 

 

data have;
set sashelp.heart;
ageatstart=ageatstart+20;  /*is NOT too big*/
if mod(_n_,500)=1 then id+1;
run;


PROC GEE DATA=have;
CLASS id ;
MODEL bp_Status(ref='High') = ageatstart / DIST=MULT LINK=GLOGIT; 
REPEATED SUBJECT=id/ TYPE=IND; 
RUN;

Ksharp_2-1739331206987.png

Ageatstart is not too big ,so you can get Z and P value.

NOTE: Z and P value is the same with the first one.

 

 

 

 

 

If you take Ageatstart as a category variable, you could get both of them whether it is 2002 or 2. and remain the same result.

data have;
set sashelp.heart;
/*ageatstart=ageatstart+200;  */
if mod(_n_,500)=1 then id+1;
run;


PROC GEE DATA=have;
CLASS id ageatstart;
MODEL bp_Status(ref='High') = ageatstart / DIST=MULT LINK=GLOGIT; 
REPEATED SUBJECT=id/ TYPE=IND; 
RUN;
data have;
set sashelp.heart;
ageatstart=ageatstart+200;  
if mod(_n_,500)=1 then id+1;
run;


PROC GEE DATA=have;
CLASS id ageatstart;
MODEL bp_Status(ref='High') = ageatstart / DIST=MULT LINK=GLOGIT; 
REPEATED SUBJECT=id/ TYPE=IND; 
RUN;

 

Sara_p-value
Fluorite | Level 6

 I should have included my data structure in my original post - I'll make sure to do that next time.

Thank you for your detailed and helpful response! Your explanation about the numerical computation issues with large year values really clarified my problem. I assume that it would make more sense to use 0, 1, 2, 3 as year values then.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 696 views
  • 5 likes
  • 2 in conversation