Hi,
I ran a linear regression with proc genmod (with a cluster statement). However, when I calculate manually predicted values, they don't fit with what is predicted in the output out statement.
Here is the code of the model:
proc genmod data=SHARE.shareHI ;
class smoke_bin(ref='0') obesity(ref='0') ah(ref='0') sex edu3(ref='2') depression(ref='0') vig_pa(ref='0') cntry (ref='5') athlos_id sex;
model diff_logHi=hi|hi|hi age_num|age_num sex edu3 smoke_bin obesity ah depression vig_pa cntry / dist=n link=id ;
repeated subject=athlos_id / type=exch; /*Generalized Estimating Equations that takes into account correlation among observations*/
weight pond;
output out=work.predicted p=PredictedValue;
run;
Here are the parameters:
| Obs | Covariate | Category | par_difHI | P_ |
| 1 | Intercept | 0.69374 | *** | |
| 2 | hi | -4.44389 | *** | |
| 3 | hi*hi | 8.73896 | *** | |
| 4 | hi*hi*hi | -6.21405 | *** | |
| 5 | age_num | 0.012015800 | *** | |
| 6 | age_num*age_num | -0.000130071 | *** | |
| 7 | sex | 0 | -0.01474 | *** |
| 8 | sex | 1 | 0 | |
| 9 | edu3 | 0 | -0.05901 | *** |
| 10 | edu3 | 1 | -0.03252 | *** |
| 11 | edu3 | 2 | 0 | |
| 12 | smoke_bin | 1 | -0.0199 | *** |
| 13 | smoke_bin | 0 | 0 | |
| 14 | obesity | 1 | -0.03258 | *** |
| 15 | obesity | 0 | 0 | |
| 16 | ah | 1 | -0.01104 | ** |
| 17 | ah | 0 | 0 | |
| 18 | depression | 1 | -0.03472 | *** |
| 19 | depression | 0 | 0 | |
| 20 | vig_pa | 1 | 0.01694 | *** |
| 21 | vig_pa | 0 | 0 | |
| 22 | cntry | 0 | 0.02285 | ** |
| 23 | cntry | 1 | 0.00532 | |
| 24 | cntry | 4 | 0.00173 | |
| 25 | cntry | 6 | 0.03573 | *** |
| 26 | cntry | 7 | -0.04092 | *** |
| 27 | cntry | 8 | 0.00929 | |
| 28 | cntry | 10 | 0.00385 | |
| 29 | cntry | 11 | 0.01284 | * |
| 30 | cntry | 15 | -0.00555 | |
| 31 | cntry | 20 | 0.03243 | *** |
| 32 | cntry | 21 | 0.01539 | * |
| 33 | cntry | 24 | 0.03717 | *** |
| 34 | cntry | 25 | 0.0168 | * |
| 35 | cntry | 5 | 0 |
Here are some selected observations, with the predicted values by SAS:
| Obs | age_num | smoke_bin | hi | obesity | ah | sex | edu3 | depression | vig_pa | cntry | sex | diff_logHi | PredictedValue |
| 16376 | 65 | 0 | 0.48359 | 0 | 0 | 0 | 2 | 0 | 1 | 5 | 0 | 0.38392 | 0.11933 |
| 18532 | 66 | 0 | 0.43931 | 0 | 0 | 0 | 1 | 0 | 0 | 5 | 0 | 0.04253 | 0.08041 |
So if I calculated myself the predicted value for obs=16376, it should be:
0.69374+-4.44389*0.48359+8.73896*0.48359*0.48359+-6.21405*0.48359*0.48359*0.48359+0.0120158*65+-0.000130071*65*65+0.01694*1=0.134064. However, the predicted value by the output out statement is 0.11933
First I though it was a question of decimals for the parameters of age_num or age_num*age_num, but adding more decimals don't change much to the calculation.
Check your design matrix, I don't think you're applying the formula correctly.
Try using PROC PLM or SCORE to verify the predicted values.
@Demographer wrote:
Hi,
I ran a linear regression with proc genmod (with a cluster statement). However, when I calculate manually predicted values, they don't fit with what is predicted in the output out statement.
Here is the code of the model:
proc genmod data=SHARE.shareHI ; class smoke_bin(ref='0') hi_gr obesity(ref='0') ah(ref='0') sex edu3(ref='2') depression(ref='0') vig_pa(ref='0') cntry (ref='5') athlos_id sex; model diff_logHi=hi|hi|hi age_num|age_num sex edu3 smoke_bin obesity ah depression vig_pa cntry / dist=n link=id ; repeated subject=athlos_id / type=exch; /*Generalized Estimating Equations that takes into account correlation among observations*/ weight pond; output out=work.predicted p=PredictedValue; run;
Here are the parameters:
Obs Covariate Category par_difHI P_ 1 Intercept 0.69374 *** 2 hi -4.44389 *** 3 hi*hi 8.73896 *** 4 hi*hi*hi -6.21405 *** 5 age_num 0.012015800 *** 6 age_num*age_num -0.000130071 *** 7 sex 0 -0.01474 *** 8 sex 1 0 9 edu3 0 -0.05901 *** 10 edu3 1 -0.03252 *** 11 edu3 2 0 12 smoke_bin 1 -0.0199 *** 13 smoke_bin 0 0 14 obesity 1 -0.03258 *** 15 obesity 0 0 16 ah 1 -0.01104 ** 17 ah 0 0 18 depression 1 -0.03472 *** 19 depression 0 0 20 vig_pa 1 0.01694 *** 21 vig_pa 0 0 22 cntry 0 0.02285 ** 23 cntry 1 0.00532 24 cntry 4 0.00173 25 cntry 6 0.03573 *** 26 cntry 7 -0.04092 *** 27 cntry 8 0.00929 28 cntry 10 0.00385 29 cntry 11 0.01284 * 30 cntry 15 -0.00555 31 cntry 20 0.03243 *** 32 cntry 21 0.01539 * 33 cntry 24 0.03717 *** 34 cntry 25 0.0168 * 35 cntry 5 0
Here are some selected observations, with the predicted values by SAS:
Obs age_num smoke_bin hi obesity ah sex edu3 depression vig_pa cntry sex diff_logHi PredictedValue 16376 65 0 0.48359 0 0 0 2 0 1 5 0 0.38392 0.11933 18532 66 0 0.43931 0 0 0 1 0 0 5 0 0.04253 0.08041
So if I calculated myself the predicted value for obs=16376, it should be:
0.69374+-4.44389*0.48359+8.73896*0.48359*0.48359+-6.21405*0.48359*0.48359*0.48359+0.0120158*65+-0.000130071*65*65+0.01694*1=0.134064. However, the predicted value by the output out statement is 0.11933
First I though it was a question of decimals for the parameters of age_num or age_num*age_num, but adding more decimals don't change much to the calculation.
Check your design matrix, I don't think you're applying the formula correctly.
Try using PROC PLM or SCORE to verify the predicted values.
@Demographer wrote:
Hi,
I ran a linear regression with proc genmod (with a cluster statement). However, when I calculate manually predicted values, they don't fit with what is predicted in the output out statement.
Here is the code of the model:
proc genmod data=SHARE.shareHI ; class smoke_bin(ref='0') hi_gr obesity(ref='0') ah(ref='0') sex edu3(ref='2') depression(ref='0') vig_pa(ref='0') cntry (ref='5') athlos_id sex; model diff_logHi=hi|hi|hi age_num|age_num sex edu3 smoke_bin obesity ah depression vig_pa cntry / dist=n link=id ; repeated subject=athlos_id / type=exch; /*Generalized Estimating Equations that takes into account correlation among observations*/ weight pond; output out=work.predicted p=PredictedValue; run;
Here are the parameters:
Obs Covariate Category par_difHI P_ 1 Intercept 0.69374 *** 2 hi -4.44389 *** 3 hi*hi 8.73896 *** 4 hi*hi*hi -6.21405 *** 5 age_num 0.012015800 *** 6 age_num*age_num -0.000130071 *** 7 sex 0 -0.01474 *** 8 sex 1 0 9 edu3 0 -0.05901 *** 10 edu3 1 -0.03252 *** 11 edu3 2 0 12 smoke_bin 1 -0.0199 *** 13 smoke_bin 0 0 14 obesity 1 -0.03258 *** 15 obesity 0 0 16 ah 1 -0.01104 ** 17 ah 0 0 18 depression 1 -0.03472 *** 19 depression 0 0 20 vig_pa 1 0.01694 *** 21 vig_pa 0 0 22 cntry 0 0.02285 ** 23 cntry 1 0.00532 24 cntry 4 0.00173 25 cntry 6 0.03573 *** 26 cntry 7 -0.04092 *** 27 cntry 8 0.00929 28 cntry 10 0.00385 29 cntry 11 0.01284 * 30 cntry 15 -0.00555 31 cntry 20 0.03243 *** 32 cntry 21 0.01539 * 33 cntry 24 0.03717 *** 34 cntry 25 0.0168 * 35 cntry 5 0
Here are some selected observations, with the predicted values by SAS:
Obs age_num smoke_bin hi obesity ah sex edu3 depression vig_pa cntry sex diff_logHi PredictedValue 16376 65 0 0.48359 0 0 0 2 0 1 5 0 0.38392 0.11933 18532 66 0 0.43931 0 0 0 1 0 0 5 0 0.04253 0.08041
So if I calculated myself the predicted value for obs=16376, it should be:
0.69374+-4.44389*0.48359+8.73896*0.48359*0.48359+-6.21405*0.48359*0.48359*0.48359+0.0120158*65+-0.000130071*65*65+0.01694*1=0.134064. However, the predicted value by the output out statement is 0.11933
First I though it was a question of decimals for the parameters of age_num or age_num*age_num, but adding more decimals don't change much to the calculation.
Ok you were right. The reference category for the variable sex was wrong.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.