Pyrite | Level 9

## Problem with predicted probabilities using proc genmod

Hi,

I ran a linear regression with proc genmod (with a cluster statement). However, when I calculate manually predicted values, they don't fit with what is predicted in the output out statement.

Here is the code of the model:

``````proc genmod data=SHARE.shareHI ;
class  smoke_bin(ref='0') obesity(ref='0') ah(ref='0') sex edu3(ref='2') depression(ref='0') vig_pa(ref='0') cntry (ref='5') athlos_id sex;
model  diff_logHi=hi|hi|hi age_num|age_num sex  edu3  smoke_bin obesity ah depression vig_pa cntry /  dist=n link=id ;
repeated subject=athlos_id / type=exch; /*Generalized Estimating Equations that takes into account correlation among observations*/
weight pond;
output out=work.predicted p=PredictedValue;
run;``````

Here are the parameters:

 Obs Covariate Category par_difHI P_ 1 Intercept 0.69374 *** 2 hi -4.44389 *** 3 hi*hi 8.73896 *** 4 hi*hi*hi -6.21405 *** 5 age_num 0.012015800 *** 6 age_num*age_num -0.000130071 *** 7 sex 0 -0.01474 *** 8 sex 1 0 9 edu3 0 -0.05901 *** 10 edu3 1 -0.03252 *** 11 edu3 2 0 12 smoke_bin 1 -0.0199 *** 13 smoke_bin 0 0 14 obesity 1 -0.03258 *** 15 obesity 0 0 16 ah 1 -0.01104 ** 17 ah 0 0 18 depression 1 -0.03472 *** 19 depression 0 0 20 vig_pa 1 0.01694 *** 21 vig_pa 0 0 22 cntry 0 0.02285 ** 23 cntry 1 0.00532 24 cntry 4 0.00173 25 cntry 6 0.03573 *** 26 cntry 7 -0.04092 *** 27 cntry 8 0.00929 28 cntry 10 0.00385 29 cntry 11 0.01284 * 30 cntry 15 -0.00555 31 cntry 20 0.03243 *** 32 cntry 21 0.01539 * 33 cntry 24 0.03717 *** 34 cntry 25 0.0168 * 35 cntry 5 0

Here are some selected observations, with the predicted values by SAS:

 Obs age_num smoke_bin hi obesity ah sex edu3 depression vig_pa cntry sex diff_logHi PredictedValue 16376 65 0 0.48359 0 0 0 2 0 1 5 0 0.38392 0.11933 18532 66 0 0.43931 0 0 0 1 0 0 5 0 0.04253 0.08041

So if I calculated myself the predicted value for obs=16376, it should be:

0.69374+-4.44389*0.48359+8.73896*0.48359*0.48359+-6.21405*0.48359*0.48359*0.48359+0.0120158*65+-0.000130071*65*65+0.01694*1=0.134064. However, the predicted value by the output out statement is 0.11933

First I though it was a question of decimals for the parameters of age_num or age_num*age_num, but adding more decimals don't change much to the calculation.

1 ACCEPTED SOLUTION

Accepted Solutions
Super User

## Re: Problem with predicted probabilities using proc genmod

Check your design matrix, I don't think you're applying the formula correctly.

Try using PROC PLM or SCORE to verify the predicted values.

@Demographer wrote:

Hi,

I ran a linear regression with proc genmod (with a cluster statement). However, when I calculate manually predicted values, they don't fit with what is predicted in the output out statement.

Here is the code of the model:

``````proc genmod data=SHARE.shareHI ;
class  smoke_bin(ref='0') hi_gr obesity(ref='0') ah(ref='0') sex edu3(ref='2') depression(ref='0') vig_pa(ref='0') cntry (ref='5') athlos_id sex;
model  diff_logHi=hi|hi|hi age_num|age_num sex  edu3  smoke_bin obesity ah depression vig_pa cntry /  dist=n link=id ;
repeated subject=athlos_id / type=exch; /*Generalized Estimating Equations that takes into account correlation among observations*/
weight pond;
output out=work.predicted p=PredictedValue;
run;``````

Here are the parameters:

 Obs Covariate Category par_difHI P_ 1 Intercept 0.69374 *** 2 hi -4.44389 *** 3 hi*hi 8.73896 *** 4 hi*hi*hi -6.21405 *** 5 age_num 0.012015800 *** 6 age_num*age_num -0.000130071 *** 7 sex 0 -0.01474 *** 8 sex 1 0 9 edu3 0 -0.05901 *** 10 edu3 1 -0.03252 *** 11 edu3 2 0 12 smoke_bin 1 -0.0199 *** 13 smoke_bin 0 0 14 obesity 1 -0.03258 *** 15 obesity 0 0 16 ah 1 -0.01104 ** 17 ah 0 0 18 depression 1 -0.03472 *** 19 depression 0 0 20 vig_pa 1 0.01694 *** 21 vig_pa 0 0 22 cntry 0 0.02285 ** 23 cntry 1 0.00532 24 cntry 4 0.00173 25 cntry 6 0.03573 *** 26 cntry 7 -0.04092 *** 27 cntry 8 0.00929 28 cntry 10 0.00385 29 cntry 11 0.01284 * 30 cntry 15 -0.00555 31 cntry 20 0.03243 *** 32 cntry 21 0.01539 * 33 cntry 24 0.03717 *** 34 cntry 25 0.0168 * 35 cntry 5 0

Here are some selected observations, with the predicted values by SAS:

 Obs age_num smoke_bin hi obesity ah sex edu3 depression vig_pa cntry sex diff_logHi PredictedValue 16376 65 0 0.48359 0 0 0 2 0 1 5 0 0.38392 0.11933 18532 66 0 0.43931 0 0 0 1 0 0 5 0 0.04253 0.08041

So if I calculated myself the predicted value for obs=16376, it should be:

0.69374+-4.44389*0.48359+8.73896*0.48359*0.48359+-6.21405*0.48359*0.48359*0.48359+0.0120158*65+-0.000130071*65*65+0.01694*1=0.134064. However, the predicted value by the output out statement is 0.11933

First I though it was a question of decimals for the parameters of age_num or age_num*age_num, but adding more decimals don't change much to the calculation.

4 REPLIES 4
Super User

## Re: Problem with predicted probabilities using proc genmod

Check your design matrix, I don't think you're applying the formula correctly.

Try using PROC PLM or SCORE to verify the predicted values.

@Demographer wrote:

Hi,

I ran a linear regression with proc genmod (with a cluster statement). However, when I calculate manually predicted values, they don't fit with what is predicted in the output out statement.

Here is the code of the model:

``````proc genmod data=SHARE.shareHI ;
class  smoke_bin(ref='0') hi_gr obesity(ref='0') ah(ref='0') sex edu3(ref='2') depression(ref='0') vig_pa(ref='0') cntry (ref='5') athlos_id sex;
model  diff_logHi=hi|hi|hi age_num|age_num sex  edu3  smoke_bin obesity ah depression vig_pa cntry /  dist=n link=id ;
repeated subject=athlos_id / type=exch; /*Generalized Estimating Equations that takes into account correlation among observations*/
weight pond;
output out=work.predicted p=PredictedValue;
run;``````

Here are the parameters:

 Obs Covariate Category par_difHI P_ 1 Intercept 0.69374 *** 2 hi -4.44389 *** 3 hi*hi 8.73896 *** 4 hi*hi*hi -6.21405 *** 5 age_num 0.012015800 *** 6 age_num*age_num -0.000130071 *** 7 sex 0 -0.01474 *** 8 sex 1 0 9 edu3 0 -0.05901 *** 10 edu3 1 -0.03252 *** 11 edu3 2 0 12 smoke_bin 1 -0.0199 *** 13 smoke_bin 0 0 14 obesity 1 -0.03258 *** 15 obesity 0 0 16 ah 1 -0.01104 ** 17 ah 0 0 18 depression 1 -0.03472 *** 19 depression 0 0 20 vig_pa 1 0.01694 *** 21 vig_pa 0 0 22 cntry 0 0.02285 ** 23 cntry 1 0.00532 24 cntry 4 0.00173 25 cntry 6 0.03573 *** 26 cntry 7 -0.04092 *** 27 cntry 8 0.00929 28 cntry 10 0.00385 29 cntry 11 0.01284 * 30 cntry 15 -0.00555 31 cntry 20 0.03243 *** 32 cntry 21 0.01539 * 33 cntry 24 0.03717 *** 34 cntry 25 0.0168 * 35 cntry 5 0

Here are some selected observations, with the predicted values by SAS:

 Obs age_num smoke_bin hi obesity ah sex edu3 depression vig_pa cntry sex diff_logHi PredictedValue 16376 65 0 0.48359 0 0 0 2 0 1 5 0 0.38392 0.11933 18532 66 0 0.43931 0 0 0 1 0 0 5 0 0.04253 0.08041

So if I calculated myself the predicted value for obs=16376, it should be:

0.69374+-4.44389*0.48359+8.73896*0.48359*0.48359+-6.21405*0.48359*0.48359*0.48359+0.0120158*65+-0.000130071*65*65+0.01694*1=0.134064. However, the predicted value by the output out statement is 0.11933

First I though it was a question of decimals for the parameters of age_num or age_num*age_num, but adding more decimals don't change much to the calculation.

Pyrite | Level 9

## Re: Problem with predicted probabilities using proc genmod

I added /PARAM=ref to be sure, but estimates are the same. So I'm not sure what would be wrong in my formula.
Pyrite | Level 9

## Re: Problem with predicted probabilities using proc genmod

Ok you were right. The reference category for the variable sex was wrong.

Super User

## Re: Problem with predicted probabilities using proc genmod

Actually that's not what I was thinking, but glad you got it solved 🙂

I also just rememberd you can use the CODE statement from PROC GENMOD to get the exact code to calculate those values.

https://documentation.sas.com/?docsetId=statug&docsetTarget=statug_genmod_syntax09.htm&docsetVersion...
Discussion stats
• 4 replies
• 772 views
• 0 likes
• 2 in conversation