I am looking for some guidance on how to score the source dataset used in a "PROC GENMOD / DIST=NORMAL" model.
In particular, how do I incorporate the "dispersion" parameter in to the calculation (e.g., y-hat = Bo + B1(X1),...,Bk(Xk)).
P.S., To address the inevitable comments on why don't i use the "OUTPUT" line to get the residuals - I am using "Bayes" statement in the model, which prevents the "OUTPUT" from working.
Thank you in advance!
The following two options solve the posted question:
PROC GENMOD DATA=my_dataset;
CLASS X1 X2;
MODEL Y = X1 X2 / DIST=NORMAL;
BAYES SEED=12345 OUTPOST=post;
STORE model_store; /*OPTION #1*/
CODE FILE='C:\temp\testcode.sas'; /*OPTION #2*/
RUN;
/*OPTION #1*/
PROC PLM SOURCE=model_store;
SCORE DATA=my_dataset OUT=preds PRED=pred LCLM=lower UCLM=upper;
RUN;
/*OPTION #2*/
DATA Pred;
SET my_dataset;
%INCLUDE 'C:\temp\testcode.sas';
RUN;
What about the CODE statement?
It worked with the example I ran...
data Liver;
input X1-X6 Y;
datalines;
19.1358 50.0110 51.000 0 0 1 3
23.5970 18.4959 3.429 0 0 1 9
20.0474 56.7699 3.429 1 1 0 6
28.0277 59.7836 4.000 0 0 1 6
28.6851 74.1589 5.714 1 0 1 1
18.8092 31.0630 2.286 0 1 1 61
28.7201 52.9178 37.286 1 0 1 6
21.3669 61.6603 54.143 0 1 1 6
23.7332 42.2904 0.571 1 0 1 21
20.4783 22.1260 19.000 1 0 1 6
22.8625 25.2164 1.714 0 1 1 6
22.0932 66.7562 2.571 0 0 1 1
24.3141 66.8000 26.714 1 1 0 2
21.4619 78.9863 9.714 0 0 1 6
23.8087 58.3260 2.000 0 1 1 6
19.3698 48.4904 2.000 1 1 1 6
23.4568 70.9890 1.429 0 0 1 6
24.4418 70.7425 5.714 1 0 1 6
22.9130 49.7041 13.143 1 0 1 6
22.5306 64.0438 4.143 1 1 1 6
32.7449 62.2082 0.143 1 1 0 3
20.0617 22.7671 0.143 1 1 1 6
15.9597 48.8137 1.571 1 0 1 6
31.4398 64.5918 63.143 0 0 1 2
22.9854 79.5205 2.714 1 0 1 1
19.2653 37.8685 4.857 1 1 1 1
19.5313 65.0630 0.857 0 0 1 6
24.1415 39.9452 4.429 1 0 1 6
17.1225 13.9342 0.429 1 0 1 6
21.4692 64.9699 4.714 1 1 1 6
25.3515 52.8027 0.857 0 0 1 6
30.1194 65.2438 6.000 1 0 0 6
29.1749 47.0301 4.286 1 1 0 6
21.7784 71.5123 2.571 1 0 1 6
17.3010 57.8575 16.714 1 1 1 6
17.0068 68.0356 69.143 1 0 1 6
20.0000 48.4027 23.714 1 0 1 6
19.2653 62.5014 2.000 1 1 0 6
25.3815 58.1671 2.143 1 1 1 6
25.9151 53.2027 113.000 1 1 1 6
22.2656 59.8904 0.857 0 0 1 6
22.4600 65.7288 5.286 1 0 0 1
18.0092 24.2274 2.286 1 0 1 6
19.4708 28.3644 0.571 1 0 1 6
20.7612 68.9342 2.714 1 0 0 2
32.0313 59.9781 5.429 0 0 1 6
19.8413 45.4740 1.143 0 0 1 6
24.4898 43.5315 4.286 1 0 1 6
21.2585 49.6274 4.714 0 0 0 6
20.0155 52.1397 5.429 1 1 1 6
19.5682 41.3233 6.571 1 1 1 1
23.6614 74.7616 6.429 1 1 1 3
20.5693 78.1671 1.857 1 1 1 6
18.7652 17.7534 104.000 1 0 1 6
21.7738 32.7616 3.571 1 0 1 6
30.8532 62.6932 3.571 1 0 1 2
23.1481 44.1178 4.571 1 0 1 2
29.7576 60.1342 0.429 1 0 1 6
21.5619 41.9096 2.429 0 0 1 6
24.3046 62.8603 3.429 0 0 1 2
20.7248 66.9918 1.429 0 0 1 6
36.3880 55.3178 1.429 1 0 0 2
21.9076 49.8466 64.143 0 1 1 3
18.3058 72.7233 0.571 1 1 1 2
26.5118 75.7562 2.143 1 0 0 2
23.4236 49.1178 4.429 1 0 1 6
24.7245 61.0521 5.000 1 0 0 1
32.2421 65.8795 0.000 0 0 0 6
23.3556 71.2712 2.857 1 0 1 3
22.7732 68.7014 3.857 0 0 0 1
19.4870 63.6192 4.143 1 0 0 1
24.5390 56.3890 5.143 0 1 1 6
26.8977 60.3507 3.000 1 1 0 6
25.2595 72.9863 5.429 0 0 1 1
22.1297 77.5808 1.286 1 0 1 6
9.6849 49.6274 0.286 0 0 1 6
17.0068 12.6466 7.143 1 0 1 1
18.4240 59.8055 0.857 1 0 1 6
19.1406 68.1781 6.857 1 1 1 4
18.5078 70.5890 2.143 0 0 1 1
19.5965 66.7315 1.143 1 0 1 1
24.4418 60.2137 4.714 1 0 0 0
30.1194 61.8740 0.143 1 1 1 6
25.3444 38.3507 4.000 0 0 1 6
21.4844 68.7726 3.143 1 0 0 1
20.1995 66.9041 5.571 1 0 1 4
25.2994 62.8685 12.714 1 0 0 6
23.6013 70.3808 4.286 1 0 1 6
27.1706 62.3397 2.429 1 0 1 6
20.9024 62.9425 7.857 0 0 0 6
20.4491 73.7890 8.000 0 0 1 1
22.1510 55.4822 1.286 0 0 1 6
22.5710 75.0274 7.571 1 0 0 6
27.9904 76.4082 1.429 1 0 0 3
29.0688 54.9479 4.143 1 0 0 1
20.9184 60.2521 2.571 0 1 0 1
18.1940 37.1808 8.143 1 0 0 2
21.4536 24.8822 1.714 0 1 0 9
14.0445 61.3288 6.571 1 0 0 6
16.7311 60.3288 2.143 1 0 0 6
24.6094 42.9918 2.571 1 0 0 6
25.0829 54.4329 16.286 1 0 0 9
21.5510 58.6658 6.857 0 0 0 6
24.2215 75.7836 3.429 0 1 0 2
30.4498 69.8795 4.429 1 0 0 2
20.6790 39.7315 2.143 1 0 1 0
59.2554 41.1342 5.571 1 0 0 3
22.7244 60.2575 41.571 1 0 0 6
20.7008 75.3671 3.429 0 0 1 3
24.6094 47.3644 8.714 0 0 0 1
21.8300 74.4027 5.286 0 0 0 6
20.8980 66.1178 34.429 0 0 0 6
31.9602 69.6247 4.000 1 0 0 6
29.4107 45.4521 4.571 1 0 0 6
22.9421 65.4027 1.143 1 0 1 21
24.8163 67.1096 3.429 1 0 0 6
19.8178 65.9014 1.286 1 1 0 6
18.7783 61.0904 2.571 1 0 0 1
26.0617 55.4384 3.571 1 0 0 1
21.6333 61.5288 3.571 0 0 0 6
32.5260 71.4904 5.714 1 0 0 9
25.4028 68.2329 48.714 1 0 0 6
20.5693 29.2575 3.571 1 0 0 6
19.2570 33.1233 0.714 1 0 0 6
20.8980 40.2822 4.857 1 0 0 1
17.0562 30.2247 2.143 1 1 0 6
25.9924 66.5151 2.857 1 0 1 6
31.0735 73.0493 8.714 1 0 0 2
20.9840 48.2027 4.857 1 0 0 2
21.4536 69.1808 2.571 0 0 0 1
26.2346 60.3425 2.571 1 0 1 1
24.1633 60.8329 11.000 1 0 1 1
26.8519 58.6877 3.429 1 0 1 2
17.0993 48.8384 3.000 0 0 0 9
19.1327 65.3425 2.571 1 0 0 1
17.3010 51.4493 4.429 1 0 0 6
;
proc genmod data=Liver;
model Y = X1-X6 / dist=Poisson link=log;
bayes seed=1 coeffprior=normal;
code file='C:\temp\testcode.sas';
run;
@H wrote:
I am looking for some guidance on how to score the source dataset used in a "PROC GENMOD / DIST=NORMAL" model.
In particular, how do I incorporate the "dispersion" parameter in to the calculation (e.g., y-hat = Bo + B1(X1),...,Bk(Xk)).
P.S., To address the inevitable comments on why don't i use the "OUTPUT" line to get the residuals - I am using "Bayes" statement in the model, which prevents the "OUTPUT" from working.
Thank you in advance!
Yes, this seems to have done something. Now I need to figure out how to use this code snippet. Any resources or recommendations? I will check back, but I am now off to the web to better understand the statement.
P.S., I can likely stare at the code and figure it out, but how does the scoring treat the "Dispersion" parameter. I only recently started to using GENMOD for its easy BAYES features. Also, in the below example, DV=logged continuous variable and IV1 is binary and IV2 is binary. Thank you.
label P_LOS_nl = 'Predicted: LOS_nl' ;
drop _LMR_BAD;
_LMR_BAD=0;
*** Generate design variables for Protocol;
drop _0_0 _0_1 ;
_0_0= 0;
_0_1= 0;
length _st8 $ 8; drop _st8;
_st8 = left(trim(put (Protocol, $8.)));
if _st8 = 'n' then do;
_0_0 = 1;
end;
else if _st8 = 'y' then do;
_0_1 = 1;
end;
else do;
_0_0 = .;
_0_1 = .;
_LMR_BAD=1;
goto _SKIP_000;
end;
*** Generate design variables for Surg_Consult;
drop _1_0 _1_1 ;
_1_0= 0;
_1_1= 0;
length _st8 $ 8; drop _st8;
_st8 = left(trim(put (Surg_Consult, $8.)));
if _st8 = 'n' then do;
_1_0 = 1;
end;
else if _st8 = 'y' then do;
_1_1 = 1;
end;
else do;
_1_0 = .;
_1_1 = .;
_LMR_BAD=1;
goto _SKIP_000;
end;
*** Compute Linear Predictors;
drop _LP0;
_LP0 = 0;
*** Effect: Protocol;
_LP0 = _LP0 + (-0.11223272308563) * _0_0;
*** Effect: Surg_Consult;
_LP0 = _LP0 + (-0.62587806220401) * _1_0;
*** Predicted values;
_LP0 = _LP0 + 3.06620493060315;
_SKIP_000:
if _LMR_BAD=1 then do;
P_LOS_nl = .;
end;
else do;
P_LOS_nl = _LP0;
end;
Well it appears "The DO Loop" had the simple answer for scoring the source dataset.
https://blogs.sas.com/content/iml/2014/02/19/scoring-a-regression-model-in-sas.html
data Pred;
set ScoreX;
%include 'glmScore.sas';
run;
Now, I just need to investigate how the "Dispersion", "Scale" term is incorporated.
Find the dispersion value in your output and then look for that number, possibly unrounded, in the code.
The scale (or dispersion) parameter shouldn't be involved in computing the predicted value. It would be involved only in its standard error.
Yes, in the code the first long number is for IV1, second for IV2, and third is the intercept.
In the model how does the scale/dispersion come into play with the SEs? As mentioned, I am fairly ignorant with the GENMOD.
The following two options solve the posted question:
PROC GENMOD DATA=my_dataset;
CLASS X1 X2;
MODEL Y = X1 X2 / DIST=NORMAL;
BAYES SEED=12345 OUTPOST=post;
STORE model_store; /*OPTION #1*/
CODE FILE='C:\temp\testcode.sas'; /*OPTION #2*/
RUN;
/*OPTION #1*/
PROC PLM SOURCE=model_store;
SCORE DATA=my_dataset OUT=preds PRED=pred LCLM=lower UCLM=upper;
RUN;
/*OPTION #2*/
DATA Pred;
SET my_dataset;
%INCLUDE 'C:\temp\testcode.sas';
RUN;
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.