BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Demographer
Pyrite | Level 9

Hi,

I ran a linear regression with proc genmod (with a cluster statement). However, when I calculate manually predicted values, they don't fit with what is predicted in the output out statement.

 

Here is the code of the model:

 

 

proc genmod data=SHARE.shareHI ;
  class  smoke_bin(ref='0') obesity(ref='0') ah(ref='0') sex edu3(ref='2') depression(ref='0') vig_pa(ref='0') cntry (ref='5') athlos_id sex;
   model  diff_logHi=hi|hi|hi age_num|age_num sex  edu3  smoke_bin obesity ah depression vig_pa cntry /  dist=n link=id ;
   repeated subject=athlos_id / type=exch; /*Generalized Estimating Equations that takes into account correlation among observations*/
   weight pond;
   output out=work.predicted p=PredictedValue;
run;

 

 

Here are the parameters:

 

Obs Covariate Category par_difHI P_
1 Intercept   0.69374 ***
2 hi   -4.44389 ***
3 hi*hi   8.73896 ***
4 hi*hi*hi   -6.21405 ***
5 age_num   0.012015800 ***
6 age_num*age_num   -0.000130071 ***
7 sex 0 -0.01474 ***
8 sex 1 0  
9 edu3 0 -0.05901 ***
10 edu3 1 -0.03252 ***
11 edu3 2 0  
12 smoke_bin 1 -0.0199 ***
13 smoke_bin 0 0  
14 obesity 1 -0.03258 ***
15 obesity 0 0  
16 ah 1 -0.01104 **
17 ah 0 0  
18 depression 1 -0.03472 ***
19 depression 0 0  
20 vig_pa 1 0.01694 ***
21 vig_pa 0 0  
22 cntry 0 0.02285 **
23 cntry 1 0.00532  
24 cntry 4 0.00173  
25 cntry 6 0.03573 ***
26 cntry 7 -0.04092 ***
27 cntry 8 0.00929  
28 cntry 10 0.00385  
29 cntry 11 0.01284 *
30 cntry 15 -0.00555  
31 cntry 20 0.03243 ***
32 cntry 21 0.01539 *
33 cntry 24 0.03717 ***
34 cntry 25 0.0168 *
35 cntry 5 0  

 

Here are some selected observations, with the predicted values by SAS:

 

Obs age_num smoke_bin hi obesity ah sex edu3 depression vig_pa cntry sex diff_logHi PredictedValue
16376 65 0 0.48359 0 0 0 2 0 1 5 0 0.38392 0.11933
18532 66 0 0.43931 0 0 0 1 0 0 5 0 0.04253 0.08041

 

So if I calculated myself the predicted value for obs=16376, it should be:

0.69374+-4.44389*0.48359+8.73896*0.48359*0.48359+-6.21405*0.48359*0.48359*0.48359+0.0120158*65+-0.000130071*65*65+0.01694*1=0.134064. However, the predicted value by the output out statement is 0.11933

 

First I though it was a question of decimals for the parameters of age_num or age_num*age_num, but adding more decimals don't change much to the calculation.

 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

Check your design matrix, I don't think you're applying the formula correctly.

 

Try using PROC PLM or SCORE to verify the predicted values.

 


@Demographer wrote:

Hi,

I ran a linear regression with proc genmod (with a cluster statement). However, when I calculate manually predicted values, they don't fit with what is predicted in the output out statement.

 

Here is the code of the model:

 

 

proc genmod data=SHARE.shareHI ;
  class  smoke_bin(ref='0') hi_gr obesity(ref='0') ah(ref='0') sex edu3(ref='2') depression(ref='0') vig_pa(ref='0') cntry (ref='5') athlos_id sex;
   model  diff_logHi=hi|hi|hi age_num|age_num sex  edu3  smoke_bin obesity ah depression vig_pa cntry /  dist=n link=id ;
   repeated subject=athlos_id / type=exch; /*Generalized Estimating Equations that takes into account correlation among observations*/
   weight pond;
   output out=work.predicted p=PredictedValue;
run;

 

 

Here are the parameters:

 

Obs Covariate Category par_difHI P_
1 Intercept   0.69374 ***
2 hi   -4.44389 ***
3 hi*hi   8.73896 ***
4 hi*hi*hi   -6.21405 ***
5 age_num   0.012015800 ***
6 age_num*age_num   -0.000130071 ***
7 sex 0 -0.01474 ***
8 sex 1 0  
9 edu3 0 -0.05901 ***
10 edu3 1 -0.03252 ***
11 edu3 2 0  
12 smoke_bin 1 -0.0199 ***
13 smoke_bin 0 0  
14 obesity 1 -0.03258 ***
15 obesity 0 0  
16 ah 1 -0.01104 **
17 ah 0 0  
18 depression 1 -0.03472 ***
19 depression 0 0  
20 vig_pa 1 0.01694 ***
21 vig_pa 0 0  
22 cntry 0 0.02285 **
23 cntry 1 0.00532  
24 cntry 4 0.00173  
25 cntry 6 0.03573 ***
26 cntry 7 -0.04092 ***
27 cntry 8 0.00929  
28 cntry 10 0.00385  
29 cntry 11 0.01284 *
30 cntry 15 -0.00555  
31 cntry 20 0.03243 ***
32 cntry 21 0.01539 *
33 cntry 24 0.03717 ***
34 cntry 25 0.0168 *
35 cntry 5 0  

 

Here are some selected observations, with the predicted values by SAS:

 

Obs age_num smoke_bin hi obesity ah sex edu3 depression vig_pa cntry sex diff_logHi PredictedValue
16376 65 0 0.48359 0 0 0 2 0 1 5 0 0.38392 0.11933
18532 66 0 0.43931 0 0 0 1 0 0 5 0 0.04253 0.08041

 

So if I calculated myself the predicted value for obs=16376, it should be:

0.69374+-4.44389*0.48359+8.73896*0.48359*0.48359+-6.21405*0.48359*0.48359*0.48359+0.0120158*65+-0.000130071*65*65+0.01694*1=0.134064. However, the predicted value by the output out statement is 0.11933

 

First I though it was a question of decimals for the parameters of age_num or age_num*age_num, but adding more decimals don't change much to the calculation.

 


 

View solution in original post

4 REPLIES 4
Reeza
Super User

Check your design matrix, I don't think you're applying the formula correctly.

 

Try using PROC PLM or SCORE to verify the predicted values.

 


@Demographer wrote:

Hi,

I ran a linear regression with proc genmod (with a cluster statement). However, when I calculate manually predicted values, they don't fit with what is predicted in the output out statement.

 

Here is the code of the model:

 

 

proc genmod data=SHARE.shareHI ;
  class  smoke_bin(ref='0') hi_gr obesity(ref='0') ah(ref='0') sex edu3(ref='2') depression(ref='0') vig_pa(ref='0') cntry (ref='5') athlos_id sex;
   model  diff_logHi=hi|hi|hi age_num|age_num sex  edu3  smoke_bin obesity ah depression vig_pa cntry /  dist=n link=id ;
   repeated subject=athlos_id / type=exch; /*Generalized Estimating Equations that takes into account correlation among observations*/
   weight pond;
   output out=work.predicted p=PredictedValue;
run;

 

 

Here are the parameters:

 

Obs Covariate Category par_difHI P_
1 Intercept   0.69374 ***
2 hi   -4.44389 ***
3 hi*hi   8.73896 ***
4 hi*hi*hi   -6.21405 ***
5 age_num   0.012015800 ***
6 age_num*age_num   -0.000130071 ***
7 sex 0 -0.01474 ***
8 sex 1 0  
9 edu3 0 -0.05901 ***
10 edu3 1 -0.03252 ***
11 edu3 2 0  
12 smoke_bin 1 -0.0199 ***
13 smoke_bin 0 0  
14 obesity 1 -0.03258 ***
15 obesity 0 0  
16 ah 1 -0.01104 **
17 ah 0 0  
18 depression 1 -0.03472 ***
19 depression 0 0  
20 vig_pa 1 0.01694 ***
21 vig_pa 0 0  
22 cntry 0 0.02285 **
23 cntry 1 0.00532  
24 cntry 4 0.00173  
25 cntry 6 0.03573 ***
26 cntry 7 -0.04092 ***
27 cntry 8 0.00929  
28 cntry 10 0.00385  
29 cntry 11 0.01284 *
30 cntry 15 -0.00555  
31 cntry 20 0.03243 ***
32 cntry 21 0.01539 *
33 cntry 24 0.03717 ***
34 cntry 25 0.0168 *
35 cntry 5 0  

 

Here are some selected observations, with the predicted values by SAS:

 

Obs age_num smoke_bin hi obesity ah sex edu3 depression vig_pa cntry sex diff_logHi PredictedValue
16376 65 0 0.48359 0 0 0 2 0 1 5 0 0.38392 0.11933
18532 66 0 0.43931 0 0 0 1 0 0 5 0 0.04253 0.08041

 

So if I calculated myself the predicted value for obs=16376, it should be:

0.69374+-4.44389*0.48359+8.73896*0.48359*0.48359+-6.21405*0.48359*0.48359*0.48359+0.0120158*65+-0.000130071*65*65+0.01694*1=0.134064. However, the predicted value by the output out statement is 0.11933

 

First I though it was a question of decimals for the parameters of age_num or age_num*age_num, but adding more decimals don't change much to the calculation.

 


 

Demographer
Pyrite | Level 9
I added /PARAM=ref to be sure, but estimates are the same. So I'm not sure what would be wrong in my formula.
Demographer
Pyrite | Level 9

Ok you were right. The reference category for the variable sex was wrong.

Reeza
Super User
Actually that's not what I was thinking, but glad you got it solved 🙂

I also just rememberd you can use the CODE statement from PROC GENMOD to get the exact code to calculate those values.

https://documentation.sas.com/?docsetId=statug&docsetTarget=statug_genmod_syntax09.htm&docsetVersion...