Solved: Problem with predicted probabilities using proc genmod

Demographer · Posted 03-26-2019 10:43 AM

Hi,

I ran a linear regression with proc genmod (with a cluster statement). However, when I calculate manually predicted values, they don't fit with what is predicted in the output out statement.

Here is the code of the model:

proc genmod data=SHARE.shareHI ;
  class  smoke_bin(ref='0') obesity(ref='0') ah(ref='0') sex edu3(ref='2') depression(ref='0') vig_pa(ref='0') cntry (ref='5') athlos_id sex;
   model  diff_logHi=hi|hi|hi age_num|age_num sex  edu3  smoke_bin obesity ah depression vig_pa cntry /  dist=n link=id ;
   repeated subject=athlos_id / type=exch; /*Generalized Estimating Equations that takes into account correlation among observations*/
   weight pond;
   output out=work.predicted p=PredictedValue;
run;

Here are the parameters:

Obs	Covariate	Category	par_difHI	P_
1	Intercept		0.69374	***
2	hi		-4.44389	***
3	hi*hi		8.73896	***
4	hihihi		-6.21405	***
5	age_num		0.012015800	***
6	age_num*age_num		-0.000130071	***
7	sex	0	-0.01474	***
8	sex	1	0
9	edu3	0	-0.05901	***
10	edu3	1	-0.03252	***
11	edu3	2	0
12	smoke_bin	1	-0.0199	***
13	smoke_bin	0	0
14	obesity	1	-0.03258	***
15	obesity	0	0
16	ah	1	-0.01104	**
17	ah	0	0
18	depression	1	-0.03472	***
19	depression	0	0
20	vig_pa	1	0.01694	***
21	vig_pa	0	0
22	cntry	0	0.02285	**
23	cntry	1	0.00532
24	cntry	4	0.00173
25	cntry	6	0.03573	***
26	cntry	7	-0.04092	***
27	cntry	8	0.00929
28	cntry	10	0.00385
29	cntry	11	0.01284	*
30	cntry	15	-0.00555
31	cntry	20	0.03243	***
32	cntry	21	0.01539	*
33	cntry	24	0.03717	***
34	cntry	25	0.0168	*
35	cntry	5	0

Here are some selected observations, with the predicted values by SAS:

Obs	age_num	smoke_bin	hi	obesity	ah	sex	edu3	depression	vig_pa	cntry	sex	diff_logHi	PredictedValue
16376	65	0	0.48359	0	0	0	2	0	1	5	0	0.38392	0.11933
18532	66	0	0.43931	0	0	0	1	0	0	5	0	0.04253	0.08041

So if I calculated myself the predicted value for obs=16376, it should be:

0.69374+-4.44389*0.48359+8.73896*0.48359*0.48359+-6.21405*0.48359*0.48359*0.48359+0.0120158*65+-0.000130071*65*65+0.01694*1=0.134064. However, the predicted value by the output out statement is 0.11933

First I though it was a question of decimals for the parameters of age_num or age_num*age_num, but adding more decimals don't change much to the calculation.

Reeza · Posted 03-26-2019 10:57 AM

Check your design matrix, I don't think you're applying the formula correctly.

Try using PROC PLM or SCORE to verify the predicted values.

@Demographer wrote:

Hi,

I ran a linear regression with proc genmod (with a cluster statement). However, when I calculate manually predicted values, they don't fit with what is predicted in the output out statement.

Here is the code of the model:
proc genmod data=SHARE.shareHI ;
  class  smoke_bin(ref='0') hi_gr obesity(ref='0') ah(ref='0') sex edu3(ref='2') depression(ref='0') vig_pa(ref='0') cntry (ref='5') athlos_id sex;
   model  diff_logHi=hi|hi|hi age_num|age_num sex  edu3  smoke_bin obesity ah depression vig_pa cntry /  dist=n link=id ;
   repeated subject=athlos_id / type=exch; /*Generalized Estimating Equations that takes into account correlation among observations*/
   weight pond;
   output out=work.predicted p=PredictedValue;
run;
Here are the parameters:

Obs Covariate Category par_difHI P_

1 Intercept 0.69374 ***

2 hi -4.44389 ***

3 hi*hi 8.73896 ***

4 hi*hi*hi -6.21405 ***

5 age_num 0.012015800 ***

6 age_num*age_num -0.000130071 ***

7 sex 0 -0.01474 ***

8 sex 1 0

9 edu3 0 -0.05901 ***

10 edu3 1 -0.03252 ***

11 edu3 2 0

12 smoke_bin 1 -0.0199 ***

13 smoke_bin 0 0

14 obesity 1 -0.03258 ***

15 obesity 0 0

16 ah 1 -0.01104 **

17 ah 0 0

18 depression 1 -0.03472 ***

19 depression 0 0

20 vig_pa 1 0.01694 ***

21 vig_pa 0 0

22 cntry 0 0.02285 **

23 cntry 1 0.00532

24 cntry 4 0.00173

25 cntry 6 0.03573 ***

26 cntry 7 -0.04092 ***

27 cntry 8 0.00929

28 cntry 10 0.00385

29 cntry 11 0.01284 *

30 cntry 15 -0.00555

31 cntry 20 0.03243 ***

32 cntry 21 0.01539 *

33 cntry 24 0.03717 ***

34 cntry 25 0.0168 *

35 cntry 5 0

Here are some selected observations, with the predicted values by SAS:

Obs age_num smoke_bin hi obesity ah sex edu3 depression vig_pa cntry sex diff_logHi PredictedValue

16376 65 0 0.48359 0 0 0 2 0 1 5 0 0.38392 0.11933

18532 66 0 0.43931 0 0 0 1 0 0 5 0 0.04253 0.08041

So if I calculated myself the predicted value for obs=16376, it should be:

0.69374+-4.44389*0.48359+8.73896*0.48359*0.48359+-6.21405*0.48359*0.48359*0.48359+0.0120158*65+-0.000130071*65*65+0.01694*1=0.134064. However, the predicted value by the output out statement is 0.11933

First I though it was a question of decimals for the parameters of age_num or age_num*age_num, but adding more decimals don't change much to the calculation.

View solution in original post

Reeza · Posted 03-26-2019 10:57 AM

Check your design matrix, I don't think you're applying the formula correctly.

Try using PROC PLM or SCORE to verify the predicted values.

@Demographer wrote:

Hi,

I ran a linear regression with proc genmod (with a cluster statement). However, when I calculate manually predicted values, they don't fit with what is predicted in the output out statement.

Here is the code of the model:
proc genmod data=SHARE.shareHI ;
  class  smoke_bin(ref='0') hi_gr obesity(ref='0') ah(ref='0') sex edu3(ref='2') depression(ref='0') vig_pa(ref='0') cntry (ref='5') athlos_id sex;
   model  diff_logHi=hi|hi|hi age_num|age_num sex  edu3  smoke_bin obesity ah depression vig_pa cntry /  dist=n link=id ;
   repeated subject=athlos_id / type=exch; /*Generalized Estimating Equations that takes into account correlation among observations*/
   weight pond;
   output out=work.predicted p=PredictedValue;
run;
Here are the parameters:

Obs Covariate Category par_difHI P_

1 Intercept 0.69374 ***

2 hi -4.44389 ***

3 hi*hi 8.73896 ***

4 hi*hi*hi -6.21405 ***

5 age_num 0.012015800 ***

6 age_num*age_num -0.000130071 ***

7 sex 0 -0.01474 ***

8 sex 1 0

9 edu3 0 -0.05901 ***

10 edu3 1 -0.03252 ***

11 edu3 2 0

12 smoke_bin 1 -0.0199 ***

13 smoke_bin 0 0

14 obesity 1 -0.03258 ***

15 obesity 0 0

16 ah 1 -0.01104 **

17 ah 0 0

18 depression 1 -0.03472 ***

19 depression 0 0

20 vig_pa 1 0.01694 ***

21 vig_pa 0 0

22 cntry 0 0.02285 **

23 cntry 1 0.00532

24 cntry 4 0.00173

25 cntry 6 0.03573 ***

26 cntry 7 -0.04092 ***

27 cntry 8 0.00929

28 cntry 10 0.00385

29 cntry 11 0.01284 *

30 cntry 15 -0.00555

31 cntry 20 0.03243 ***

32 cntry 21 0.01539 *

33 cntry 24 0.03717 ***

34 cntry 25 0.0168 *

35 cntry 5 0

Here are some selected observations, with the predicted values by SAS:

Obs age_num smoke_bin hi obesity ah sex edu3 depression vig_pa cntry sex diff_logHi PredictedValue

16376 65 0 0.48359 0 0 0 2 0 1 5 0 0.38392 0.11933

18532 66 0 0.43931 0 0 0 1 0 0 5 0 0.04253 0.08041

So if I calculated myself the predicted value for obs=16376, it should be:

0.69374+-4.44389*0.48359+8.73896*0.48359*0.48359+-6.21405*0.48359*0.48359*0.48359+0.0120158*65+-0.000130071*65*65+0.01694*1=0.134064. However, the predicted value by the output out statement is 0.11933

First I though it was a question of decimals for the parameters of age_num or age_num*age_num, but adding more decimals don't change much to the calculation.

Demographer · Posted 03-26-2019 11:10 AM

I added /PARAM=ref to be sure, but estimates are the same. So I'm not sure what would be wrong in my formula.

Demographer · Posted 03-26-2019 12:00 PM

Ok you were right. The reference category for the variable sex was wrong.

Reeza · Posted 03-26-2019 12:19 PM

Actually that's not what I was thinking, but glad you got it solved 🙂

I also just rememberd you can use the CODE statement from PROC GENMOD to get the exact code to calculate those values.

https://documentation.sas.com/?docsetId=statug&docsetTarget=statug_genmod_syntax09.htm&docsetVersion...

Problem with predicted probabilities using proc genmod

Re: Problem with predicted probabilities using proc genmod

Re: Problem with predicted probabilities using proc genmod

Re: Problem with predicted probabilities using proc genmod

Re: Problem with predicted probabilities using proc genmod

Re: Problem with predicted probabilities using proc genmod

Problem with predicted probabilities using proc genmod

Re: Problem with predicted probabilities using proc genmod

Re: Problem with predicted probabilities using proc genmod

Re: Problem with predicted probabilities using proc genmod

Re: Problem with predicted probabilities using proc genmod

Re: Problem with predicted probabilities using proc genmod

Registration is open