BookmarkSubscribeRSS Feed
longitudinal
Fluorite | Level 6

Dear community members,

 

Please advise on this: 

I was trying to using PROC GLIMMIX with random intercept, and a "store" command on my training set, and then I want to calculate the prediction using an external dataset (validation set) using PROC PLM with the random effect.

I was a little confused, because the "store" command may not be able to save the random effect information.

If PROC GLM cannot do this automatically, how would we do it manually? 

 

Thank you in advance!

 

 

8 REPLIES 8
ballardw
Super User

The "manual" approach, if I understand what you intend by that, would be to add the model variables to the "training" data set with missing values for the dependent variable. Then use an OUTPUT statement to create an output set requesting that the predicted values be included by using the PRED= option.

Rick_SAS
SAS Super FREQ

It sounds like you want to use the "missing value trick" to append the scoring data to the data that is used to fit the model? If so, see the example and discussion in the article "The missing value trick for scoring a regression model."

longitudinal
Fluorite | Level 6
Thank you! It is really helpful!
Questionplease
Calcite | Level 5

I just noticed the same issue with PLM using Glimmix for a logistic model with  random intercepts. Even if i take the dataset output from the original model and then score it using the plm procedure i get different predicted values (thankfully i plotted both).  Is there another option besides the manual approach.I have several simulated datasets i need to score.  Seems like it should generate a warning in the log at least?

Rick_SAS
SAS Super FREQ

Please provide your SAS code so we can understand what you are doing.  Showing the output (or graph) would also be helpful.

 

With a nonlinear link function, PROC GLIMMIX enables you to make four different "predictions." They correspond to the various combinations of the BLUB|NOBLUP and ILINK|NOILINK options. Make sure you are being consistent and comparing the same kind of prediction.

Questionplease
Calcite | Level 5
Thank you for your response. I did run the "manual" method and it is not
really more computationally intensive so that's great. This work is for a
client so I'm not able to post the graphics but a generalized version of the
code is below where A and B are continuous and C is categorical and there
are two interaction terms and a random intercepts term for D. The PLM
predictions correspond to the PRED(ILINK NOBLUP), not the PRED(ILINK) . If
I use the missing value trick, the random intercepts are incorporated.
Perhaps I'm missing an option in PLM . I'm sure it's complicated to include
the random effects unless you assume they are essentially fixed which I
guess is what is happening when you score using the missing value trick?
Thanks.

PROC GLIMMIX DATA=LOGIT PLOTS = ALL ;

CLASS C D;

MODEL MARK(EVENT='1') = A B C A*B B*C / LINK=LOGIT DIST=BINARY SOLUTION;

RANDOM INT / SUBJECT = D;

OUTPUT OUT=D.TRANSECT_GLIMMIX PRED(ILINK)=PREDPROB PRED(ILINK
NOBLUP)=FIX_PREDPROB;

STORE D.TEST_PLM;

RUN;



PROC PLM RESTORE=D.TEST_PLM;

SCORE DATA=D.TRANSECT_GLIMMIX OUT=D.SCORE_IT PREDICTED UCLM LCLM /
ILINK;

RUN;


Rick_SAS
SAS Super FREQ

You correctly state that the SCORE_IT data set contains the variable FIX_PREDPROB (from the OUTPUT statement of GLIMMIX) and the PREDICTED variable (from the SCORE statement of PLM). These two variables are equal.

 

What your question? What do you want that you are not getting? Here is a simulation that generates data as you describe so that we can all see the same results:

 

data LOGIT;
call streaminit(1);
do c = 0, 1;
   do D = 1 to 20;
      ran = rand("Normal");
      do i = 1 to 10;
         a = rand("Normal");
         b = rand("Normal");
         eta = ran + 3*a - 4*b + 0.1*a*b - 0.8*b*c;
         if D <= 15 then 
            MARK = rand("Bern", logistic(eta/5));
         else 
            MARK = .;   /* missing values ==> obs to be scored */
         output;
      end;
   end;
end;
run;

PROC GLIMMIX DATA=LOGIT PLOTS = ALL ;
CLASS C D;
MODEL MARK(EVENT='1') = A B C A*B B*C / LINK=LOGIT DIST=BINARY SOLUTION;
RANDOM INT / SUBJECT = D;
OUTPUT OUT=TRANSECT_GLIMMIX PRED(ILINK)=PREDPROB PRED(ILINK
NOBLUP)=FIX_PREDPROB;
STORE TEST_PLM;
RUN;


PROC PLM RESTORE=TEST_PLM;
show cov Parms;
SCORE DATA=TRANSECT_GLIMMIX OUT=SCORE_IT PREDICTED UCLM LCLM /
ILINK;
RUN;

You can add (WHERE(MARK=.)) to the DATA=OPTION on the SCORE statement to score only the observations for which MARK is missing.

Questionplease
Calcite | Level 5
Thanks for the quick response, and the simulation! Once the model is
developed I intended to use PLM to make predictions under multiple scenarios
where the term A is varied and then assess the difference in the exceedence
frequency for each scenario using a cutpoint value. So for instance, if the
cutpoint for identifying an exceedence is 0.50, then for row 239 the BLUP
value is greater than 0.50 while the NOBLUP value is less than 0.50. In my
case this makes a fairly substantial difference but now I am thinking that
this means I should probably specify D as a fixed effect even though the
overall objective is to evaluate the effects of A.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 4329 views
  • 0 likes
  • 4 in conversation