Hi,
I ran a linear regression with proc genmod (with a cluster statement). However, when I calculate manually predicted values, they don't fit with what is predicted in the output out statement.
Here is the code of the model:
proc genmod data=SHARE.shareHI ;
class smoke_bin(ref='0') obesity(ref='0') ah(ref='0') sex edu3(ref='2') depression(ref='0') vig_pa(ref='0') cntry (ref='5') athlos_id sex;
model diff_logHi=hi|hi|hi age_num|age_num sex edu3 smoke_bin obesity ah depression vig_pa cntry / dist=n link=id ;
repeated subject=athlos_id / type=exch; /*Generalized Estimating Equations that takes into account correlation among observations*/
weight pond;
output out=work.predicted p=PredictedValue;
run;
Here are the parameters:
Obs | Covariate | Category | par_difHI | P_ |
1 | Intercept | 0.69374 | *** | |
2 | hi | -4.44389 | *** | |
3 | hi*hi | 8.73896 | *** | |
4 | hi*hi*hi | -6.21405 | *** | |
5 | age_num | 0.012015800 | *** | |
6 | age_num*age_num | -0.000130071 | *** | |
7 | sex | 0 | -0.01474 | *** |
8 | sex | 1 | 0 | |
9 | edu3 | 0 | -0.05901 | *** |
10 | edu3 | 1 | -0.03252 | *** |
11 | edu3 | 2 | 0 | |
12 | smoke_bin | 1 | -0.0199 | *** |
13 | smoke_bin | 0 | 0 | |
14 | obesity | 1 | -0.03258 | *** |
15 | obesity | 0 | 0 | |
16 | ah | 1 | -0.01104 | ** |
17 | ah | 0 | 0 | |
18 | depression | 1 | -0.03472 | *** |
19 | depression | 0 | 0 | |
20 | vig_pa | 1 | 0.01694 | *** |
21 | vig_pa | 0 | 0 | |
22 | cntry | 0 | 0.02285 | ** |
23 | cntry | 1 | 0.00532 | |
24 | cntry | 4 | 0.00173 | |
25 | cntry | 6 | 0.03573 | *** |
26 | cntry | 7 | -0.04092 | *** |
27 | cntry | 8 | 0.00929 | |
28 | cntry | 10 | 0.00385 | |
29 | cntry | 11 | 0.01284 | * |
30 | cntry | 15 | -0.00555 | |
31 | cntry | 20 | 0.03243 | *** |
32 | cntry | 21 | 0.01539 | * |
33 | cntry | 24 | 0.03717 | *** |
34 | cntry | 25 | 0.0168 | * |
35 | cntry | 5 | 0 |
Here are some selected observations, with the predicted values by SAS:
Obs | age_num | smoke_bin | hi | obesity | ah | sex | edu3 | depression | vig_pa | cntry | sex | diff_logHi | PredictedValue |
16376 | 65 | 0 | 0.48359 | 0 | 0 | 0 | 2 | 0 | 1 | 5 | 0 | 0.38392 | 0.11933 |
18532 | 66 | 0 | 0.43931 | 0 | 0 | 0 | 1 | 0 | 0 | 5 | 0 | 0.04253 | 0.08041 |
So if I calculated myself the predicted value for obs=16376, it should be:
0.69374+-4.44389*0.48359+8.73896*0.48359*0.48359+-6.21405*0.48359*0.48359*0.48359+0.0120158*65+-0.000130071*65*65+0.01694*1=0.134064. However, the predicted value by the output out statement is 0.11933
First I though it was a question of decimals for the parameters of age_num or age_num*age_num, but adding more decimals don't change much to the calculation.
Check your design matrix, I don't think you're applying the formula correctly.
Try using PROC PLM or SCORE to verify the predicted values.
@Demographer wrote:
Hi,
I ran a linear regression with proc genmod (with a cluster statement). However, when I calculate manually predicted values, they don't fit with what is predicted in the output out statement.
Here is the code of the model:
proc genmod data=SHARE.shareHI ; class smoke_bin(ref='0') hi_gr obesity(ref='0') ah(ref='0') sex edu3(ref='2') depression(ref='0') vig_pa(ref='0') cntry (ref='5') athlos_id sex; model diff_logHi=hi|hi|hi age_num|age_num sex edu3 smoke_bin obesity ah depression vig_pa cntry / dist=n link=id ; repeated subject=athlos_id / type=exch; /*Generalized Estimating Equations that takes into account correlation among observations*/ weight pond; output out=work.predicted p=PredictedValue; run;
Here are the parameters:
Obs Covariate Category par_difHI P_ 1 Intercept 0.69374 *** 2 hi -4.44389 *** 3 hi*hi 8.73896 *** 4 hi*hi*hi -6.21405 *** 5 age_num 0.012015800 *** 6 age_num*age_num -0.000130071 *** 7 sex 0 -0.01474 *** 8 sex 1 0 9 edu3 0 -0.05901 *** 10 edu3 1 -0.03252 *** 11 edu3 2 0 12 smoke_bin 1 -0.0199 *** 13 smoke_bin 0 0 14 obesity 1 -0.03258 *** 15 obesity 0 0 16 ah 1 -0.01104 ** 17 ah 0 0 18 depression 1 -0.03472 *** 19 depression 0 0 20 vig_pa 1 0.01694 *** 21 vig_pa 0 0 22 cntry 0 0.02285 ** 23 cntry 1 0.00532 24 cntry 4 0.00173 25 cntry 6 0.03573 *** 26 cntry 7 -0.04092 *** 27 cntry 8 0.00929 28 cntry 10 0.00385 29 cntry 11 0.01284 * 30 cntry 15 -0.00555 31 cntry 20 0.03243 *** 32 cntry 21 0.01539 * 33 cntry 24 0.03717 *** 34 cntry 25 0.0168 * 35 cntry 5 0
Here are some selected observations, with the predicted values by SAS:
Obs age_num smoke_bin hi obesity ah sex edu3 depression vig_pa cntry sex diff_logHi PredictedValue 16376 65 0 0.48359 0 0 0 2 0 1 5 0 0.38392 0.11933 18532 66 0 0.43931 0 0 0 1 0 0 5 0 0.04253 0.08041
So if I calculated myself the predicted value for obs=16376, it should be:
0.69374+-4.44389*0.48359+8.73896*0.48359*0.48359+-6.21405*0.48359*0.48359*0.48359+0.0120158*65+-0.000130071*65*65+0.01694*1=0.134064. However, the predicted value by the output out statement is 0.11933
First I though it was a question of decimals for the parameters of age_num or age_num*age_num, but adding more decimals don't change much to the calculation.
Check your design matrix, I don't think you're applying the formula correctly.
Try using PROC PLM or SCORE to verify the predicted values.
@Demographer wrote:
Hi,
I ran a linear regression with proc genmod (with a cluster statement). However, when I calculate manually predicted values, they don't fit with what is predicted in the output out statement.
Here is the code of the model:
proc genmod data=SHARE.shareHI ; class smoke_bin(ref='0') hi_gr obesity(ref='0') ah(ref='0') sex edu3(ref='2') depression(ref='0') vig_pa(ref='0') cntry (ref='5') athlos_id sex; model diff_logHi=hi|hi|hi age_num|age_num sex edu3 smoke_bin obesity ah depression vig_pa cntry / dist=n link=id ; repeated subject=athlos_id / type=exch; /*Generalized Estimating Equations that takes into account correlation among observations*/ weight pond; output out=work.predicted p=PredictedValue; run;
Here are the parameters:
Obs Covariate Category par_difHI P_ 1 Intercept 0.69374 *** 2 hi -4.44389 *** 3 hi*hi 8.73896 *** 4 hi*hi*hi -6.21405 *** 5 age_num 0.012015800 *** 6 age_num*age_num -0.000130071 *** 7 sex 0 -0.01474 *** 8 sex 1 0 9 edu3 0 -0.05901 *** 10 edu3 1 -0.03252 *** 11 edu3 2 0 12 smoke_bin 1 -0.0199 *** 13 smoke_bin 0 0 14 obesity 1 -0.03258 *** 15 obesity 0 0 16 ah 1 -0.01104 ** 17 ah 0 0 18 depression 1 -0.03472 *** 19 depression 0 0 20 vig_pa 1 0.01694 *** 21 vig_pa 0 0 22 cntry 0 0.02285 ** 23 cntry 1 0.00532 24 cntry 4 0.00173 25 cntry 6 0.03573 *** 26 cntry 7 -0.04092 *** 27 cntry 8 0.00929 28 cntry 10 0.00385 29 cntry 11 0.01284 * 30 cntry 15 -0.00555 31 cntry 20 0.03243 *** 32 cntry 21 0.01539 * 33 cntry 24 0.03717 *** 34 cntry 25 0.0168 * 35 cntry 5 0
Here are some selected observations, with the predicted values by SAS:
Obs age_num smoke_bin hi obesity ah sex edu3 depression vig_pa cntry sex diff_logHi PredictedValue 16376 65 0 0.48359 0 0 0 2 0 1 5 0 0.38392 0.11933 18532 66 0 0.43931 0 0 0 1 0 0 5 0 0.04253 0.08041
So if I calculated myself the predicted value for obs=16376, it should be:
0.69374+-4.44389*0.48359+8.73896*0.48359*0.48359+-6.21405*0.48359*0.48359*0.48359+0.0120158*65+-0.000130071*65*65+0.01694*1=0.134064. However, the predicted value by the output out statement is 0.11933
First I though it was a question of decimals for the parameters of age_num or age_num*age_num, but adding more decimals don't change much to the calculation.
Ok you were right. The reference category for the variable sex was wrong.
Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.
Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.