BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Demographer
Pyrite | Level 9

Hi,

I ran a linear regression with proc genmod (with a cluster statement). However, when I calculate manually predicted values, they don't fit with what is predicted in the output out statement.

 

Here is the code of the model:

 

 

proc genmod data=SHARE.shareHI ;
  class  smoke_bin(ref='0') obesity(ref='0') ah(ref='0') sex edu3(ref='2') depression(ref='0') vig_pa(ref='0') cntry (ref='5') athlos_id sex;
   model  diff_logHi=hi|hi|hi age_num|age_num sex  edu3  smoke_bin obesity ah depression vig_pa cntry /  dist=n link=id ;
   repeated subject=athlos_id / type=exch; /*Generalized Estimating Equations that takes into account correlation among observations*/
   weight pond;
   output out=work.predicted p=PredictedValue;
run;

 

 

Here are the parameters:

 

Obs Covariate Category par_difHI P_
1 Intercept   0.69374 ***
2 hi   -4.44389 ***
3 hi*hi   8.73896 ***
4 hi*hi*hi   -6.21405 ***
5 age_num   0.012015800 ***
6 age_num*age_num   -0.000130071 ***
7 sex 0 -0.01474 ***
8 sex 1 0  
9 edu3 0 -0.05901 ***
10 edu3 1 -0.03252 ***
11 edu3 2 0  
12 smoke_bin 1 -0.0199 ***
13 smoke_bin 0 0  
14 obesity 1 -0.03258 ***
15 obesity 0 0  
16 ah 1 -0.01104 **
17 ah 0 0  
18 depression 1 -0.03472 ***
19 depression 0 0  
20 vig_pa 1 0.01694 ***
21 vig_pa 0 0  
22 cntry 0 0.02285 **
23 cntry 1 0.00532  
24 cntry 4 0.00173  
25 cntry 6 0.03573 ***
26 cntry 7 -0.04092 ***
27 cntry 8 0.00929  
28 cntry 10 0.00385  
29 cntry 11 0.01284 *
30 cntry 15 -0.00555  
31 cntry 20 0.03243 ***
32 cntry 21 0.01539 *
33 cntry 24 0.03717 ***
34 cntry 25 0.0168 *
35 cntry 5 0  

 

Here are some selected observations, with the predicted values by SAS:

 

Obs age_num smoke_bin hi obesity ah sex edu3 depression vig_pa cntry sex diff_logHi PredictedValue
16376 65 0 0.48359 0 0 0 2 0 1 5 0 0.38392 0.11933
18532 66 0 0.43931 0 0 0 1 0 0 5 0 0.04253 0.08041

 

So if I calculated myself the predicted value for obs=16376, it should be:

0.69374+-4.44389*0.48359+8.73896*0.48359*0.48359+-6.21405*0.48359*0.48359*0.48359+0.0120158*65+-0.000130071*65*65+0.01694*1=0.134064. However, the predicted value by the output out statement is 0.11933

 

First I though it was a question of decimals for the parameters of age_num or age_num*age_num, but adding more decimals don't change much to the calculation.

 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

Check your design matrix, I don't think you're applying the formula correctly.

 

Try using PROC PLM or SCORE to verify the predicted values.

 


@Demographer wrote:

Hi,

I ran a linear regression with proc genmod (with a cluster statement). However, when I calculate manually predicted values, they don't fit with what is predicted in the output out statement.

 

Here is the code of the model:

 

 

proc genmod data=SHARE.shareHI ;
  class  smoke_bin(ref='0') hi_gr obesity(ref='0') ah(ref='0') sex edu3(ref='2') depression(ref='0') vig_pa(ref='0') cntry (ref='5') athlos_id sex;
   model  diff_logHi=hi|hi|hi age_num|age_num sex  edu3  smoke_bin obesity ah depression vig_pa cntry /  dist=n link=id ;
   repeated subject=athlos_id / type=exch; /*Generalized Estimating Equations that takes into account correlation among observations*/
   weight pond;
   output out=work.predicted p=PredictedValue;
run;

 

 

Here are the parameters:

 

Obs Covariate Category par_difHI P_
1 Intercept   0.69374 ***
2 hi   -4.44389 ***
3 hi*hi   8.73896 ***
4 hi*hi*hi   -6.21405 ***
5 age_num   0.012015800 ***
6 age_num*age_num   -0.000130071 ***
7 sex 0 -0.01474 ***
8 sex 1 0  
9 edu3 0 -0.05901 ***
10 edu3 1 -0.03252 ***
11 edu3 2 0  
12 smoke_bin 1 -0.0199 ***
13 smoke_bin 0 0  
14 obesity 1 -0.03258 ***
15 obesity 0 0  
16 ah 1 -0.01104 **
17 ah 0 0  
18 depression 1 -0.03472 ***
19 depression 0 0  
20 vig_pa 1 0.01694 ***
21 vig_pa 0 0  
22 cntry 0 0.02285 **
23 cntry 1 0.00532  
24 cntry 4 0.00173  
25 cntry 6 0.03573 ***
26 cntry 7 -0.04092 ***
27 cntry 8 0.00929  
28 cntry 10 0.00385  
29 cntry 11 0.01284 *
30 cntry 15 -0.00555  
31 cntry 20 0.03243 ***
32 cntry 21 0.01539 *
33 cntry 24 0.03717 ***
34 cntry 25 0.0168 *
35 cntry 5 0  

 

Here are some selected observations, with the predicted values by SAS:

 

Obs age_num smoke_bin hi obesity ah sex edu3 depression vig_pa cntry sex diff_logHi PredictedValue
16376 65 0 0.48359 0 0 0 2 0 1 5 0 0.38392 0.11933
18532 66 0 0.43931 0 0 0 1 0 0 5 0 0.04253 0.08041

 

So if I calculated myself the predicted value for obs=16376, it should be:

0.69374+-4.44389*0.48359+8.73896*0.48359*0.48359+-6.21405*0.48359*0.48359*0.48359+0.0120158*65+-0.000130071*65*65+0.01694*1=0.134064. However, the predicted value by the output out statement is 0.11933

 

First I though it was a question of decimals for the parameters of age_num or age_num*age_num, but adding more decimals don't change much to the calculation.

 


 

View solution in original post

4 REPLIES 4
Reeza
Super User

Check your design matrix, I don't think you're applying the formula correctly.

 

Try using PROC PLM or SCORE to verify the predicted values.

 


@Demographer wrote:

Hi,

I ran a linear regression with proc genmod (with a cluster statement). However, when I calculate manually predicted values, they don't fit with what is predicted in the output out statement.

 

Here is the code of the model:

 

 

proc genmod data=SHARE.shareHI ;
  class  smoke_bin(ref='0') hi_gr obesity(ref='0') ah(ref='0') sex edu3(ref='2') depression(ref='0') vig_pa(ref='0') cntry (ref='5') athlos_id sex;
   model  diff_logHi=hi|hi|hi age_num|age_num sex  edu3  smoke_bin obesity ah depression vig_pa cntry /  dist=n link=id ;
   repeated subject=athlos_id / type=exch; /*Generalized Estimating Equations that takes into account correlation among observations*/
   weight pond;
   output out=work.predicted p=PredictedValue;
run;

 

 

Here are the parameters:

 

Obs Covariate Category par_difHI P_
1 Intercept   0.69374 ***
2 hi   -4.44389 ***
3 hi*hi   8.73896 ***
4 hi*hi*hi   -6.21405 ***
5 age_num   0.012015800 ***
6 age_num*age_num   -0.000130071 ***
7 sex 0 -0.01474 ***
8 sex 1 0  
9 edu3 0 -0.05901 ***
10 edu3 1 -0.03252 ***
11 edu3 2 0  
12 smoke_bin 1 -0.0199 ***
13 smoke_bin 0 0  
14 obesity 1 -0.03258 ***
15 obesity 0 0  
16 ah 1 -0.01104 **
17 ah 0 0  
18 depression 1 -0.03472 ***
19 depression 0 0  
20 vig_pa 1 0.01694 ***
21 vig_pa 0 0  
22 cntry 0 0.02285 **
23 cntry 1 0.00532  
24 cntry 4 0.00173  
25 cntry 6 0.03573 ***
26 cntry 7 -0.04092 ***
27 cntry 8 0.00929  
28 cntry 10 0.00385  
29 cntry 11 0.01284 *
30 cntry 15 -0.00555  
31 cntry 20 0.03243 ***
32 cntry 21 0.01539 *
33 cntry 24 0.03717 ***
34 cntry 25 0.0168 *
35 cntry 5 0  

 

Here are some selected observations, with the predicted values by SAS:

 

Obs age_num smoke_bin hi obesity ah sex edu3 depression vig_pa cntry sex diff_logHi PredictedValue
16376 65 0 0.48359 0 0 0 2 0 1 5 0 0.38392 0.11933
18532 66 0 0.43931 0 0 0 1 0 0 5 0 0.04253 0.08041

 

So if I calculated myself the predicted value for obs=16376, it should be:

0.69374+-4.44389*0.48359+8.73896*0.48359*0.48359+-6.21405*0.48359*0.48359*0.48359+0.0120158*65+-0.000130071*65*65+0.01694*1=0.134064. However, the predicted value by the output out statement is 0.11933

 

First I though it was a question of decimals for the parameters of age_num or age_num*age_num, but adding more decimals don't change much to the calculation.

 


 

Demographer
Pyrite | Level 9
I added /PARAM=ref to be sure, but estimates are the same. So I'm not sure what would be wrong in my formula.
Demographer
Pyrite | Level 9

Ok you were right. The reference category for the variable sex was wrong.

Reeza
Super User
Actually that's not what I was thinking, but glad you got it solved 🙂

I also just rememberd you can use the CODE statement from PROC GENMOD to get the exact code to calculate those values.

https://documentation.sas.com/?docsetId=statug&docsetTarget=statug_genmod_syntax09.htm&docsetVersion...

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1435 views
  • 0 likes
  • 2 in conversation