Dear community members,
Please advise on this:
I was trying to using PROC GLIMMIX with random intercept, and a "store" command on my training set, and then I want to calculate the prediction using an external dataset (validation set) using PROC PLM with the random effect.
I was a little confused, because the "store" command may not be able to save the random effect information.
If PROC GLM cannot do this automatically, how would we do it manually?
Thank you in advance!
The "manual" approach, if I understand what you intend by that, would be to add the model variables to the "training" data set with missing values for the dependent variable. Then use an OUTPUT statement to create an output set requesting that the predicted values be included by using the PRED= option.
It sounds like you want to use the "missing value trick" to append the scoring data to the data that is used to fit the model? If so, see the example and discussion in the article "The missing value trick for scoring a regression model."
I just noticed the same issue with PLM using Glimmix for a logistic model with random intercepts. Even if i take the dataset output from the original model and then score it using the plm procedure i get different predicted values (thankfully i plotted both). Is there another option besides the manual approach.I have several simulated datasets i need to score. Seems like it should generate a warning in the log at least?
Please provide your SAS code so we can understand what you are doing. Showing the output (or graph) would also be helpful.
With a nonlinear link function, PROC GLIMMIX enables you to make four different "predictions." They correspond to the various combinations of the BLUB|NOBLUP and ILINK|NOILINK options. Make sure you are being consistent and comparing the same kind of prediction.
You correctly state that the SCORE_IT data set contains the variable FIX_PREDPROB (from the OUTPUT statement of GLIMMIX) and the PREDICTED variable (from the SCORE statement of PLM). These two variables are equal.
What your question? What do you want that you are not getting? Here is a simulation that generates data as you describe so that we can all see the same results:
data LOGIT;
call streaminit(1);
do c = 0, 1;
do D = 1 to 20;
ran = rand("Normal");
do i = 1 to 10;
a = rand("Normal");
b = rand("Normal");
eta = ran + 3*a - 4*b + 0.1*a*b - 0.8*b*c;
if D <= 15 then
MARK = rand("Bern", logistic(eta/5));
else
MARK = .; /* missing values ==> obs to be scored */
output;
end;
end;
end;
run;
PROC GLIMMIX DATA=LOGIT PLOTS = ALL ;
CLASS C D;
MODEL MARK(EVENT='1') = A B C A*B B*C / LINK=LOGIT DIST=BINARY SOLUTION;
RANDOM INT / SUBJECT = D;
OUTPUT OUT=TRANSECT_GLIMMIX PRED(ILINK)=PREDPROB PRED(ILINK
NOBLUP)=FIX_PREDPROB;
STORE TEST_PLM;
RUN;
PROC PLM RESTORE=TEST_PLM;
show cov Parms;
SCORE DATA=TRANSECT_GLIMMIX OUT=SCORE_IT PREDICTED UCLM LCLM /
ILINK;
RUN;
You can add (WHERE(MARK=.)) to the DATA=OPTION on the SCORE statement to score only the observations for which MARK is missing.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.