Prediction using proc glimmix and proc plm

longitudinal · Posted 09-11-2018 05:32 PM

Dear community members,

Please advise on this:

I was trying to using PROC GLIMMIX with random intercept, and a "store" command on my training set, and then I want to calculate the prediction using an external dataset (validation set) using PROC PLM with the random effect.

I was a little confused, because the "store" command may not be able to save the random effect information.

If PROC GLM cannot do this automatically, how would we do it manually?

Thank you in advance!

ballardw · Posted 09-12-2018 06:42 PM

The "manual" approach, if I understand what you intend by that, would be to add the model variables to the "training" data set with missing values for the dependent variable. Then use an OUTPUT statement to create an output set requesting that the predicted values be included by using the PRED= option.

Rick_SAS · Posted 09-13-2018 08:32 AM

It sounds like you want to use the "missing value trick" to append the scoring data to the data that is used to fit the model? If so, see the example and discussion in the article "The missing value trick for scoring a regression model."

longitudinal · Posted 09-13-2018 11:18 AM

Thank you! It is really helpful!

Questionplease · Posted 10-02-2018 10:59 PM

I just noticed the same issue with PLM using Glimmix for a logistic model with random intercepts. Even if i take the dataset output from the original model and then score it using the plm procedure i get different predicted values (thankfully i plotted both). Is there another option besides the manual approach.I have several simulated datasets i need to score. Seems like it should generate a warning in the log at least?

Rick_SAS · Posted 10-03-2018 05:41 AM

Please provide your SAS code so we can understand what you are doing. Showing the output (or graph) would also be helpful.

With a nonlinear link function, PROC GLIMMIX enables you to make four different "predictions." They correspond to the various combinations of the BLUB|NOBLUP and ILINK|NOILINK options. Make sure you are being consistent and comparing the same kind of prediction.

Questionplease · Posted 10-03-2018 09:01 AM

Thank you for your response. I did run the "manual" method and it is not
really more computationally intensive so that's great. This work is for a
client so I'm not able to post the graphics but a generalized version of the
code is below where A and B are continuous and C is categorical and there
are two interaction terms and a random intercepts term for D. The PLM
predictions correspond to the PRED(ILINK NOBLUP), not the PRED(ILINK) . If
I use the missing value trick, the random intercepts are incorporated.
Perhaps I'm missing an option in PLM . I'm sure it's complicated to include
the random effects unless you assume they are essentially fixed which I
guess is what is happening when you score using the missing value trick?
Thanks.

PROC GLIMMIX DATA=LOGIT PLOTS = ALL ;

CLASS C D;

MODEL MARK(EVENT='1') = A B C A*B B*C / LINK=LOGIT DIST=BINARY SOLUTION;

RANDOM INT / SUBJECT = D;

OUTPUT OUT=D.TRANSECT_GLIMMIX PRED(ILINK)=PREDPROB PRED(ILINK
NOBLUP)=FIX_PREDPROB;

STORE D.TEST_PLM;

RUN;

PROC PLM RESTORE=D.TEST_PLM;

SCORE DATA=D.TRANSECT_GLIMMIX OUT=D.SCORE_IT PREDICTED UCLM LCLM /
ILINK;

RUN;

Rick_SAS · Posted 10-03-2018 09:39 AM

You correctly state that the SCORE_IT data set contains the variable FIX_PREDPROB (from the OUTPUT statement of GLIMMIX) and the PREDICTED variable (from the SCORE statement of PLM). These two variables are equal.

What your question? What do you want that you are not getting? Here is a simulation that generates data as you describe so that we can all see the same results:

data LOGIT;
call streaminit(1);
do c = 0, 1;
   do D = 1 to 20;
      ran = rand("Normal");
      do i = 1 to 10;
         a = rand("Normal");
         b = rand("Normal");
         eta = ran + 3*a - 4*b + 0.1*a*b - 0.8*b*c;
         if D <= 15 then 
            MARK = rand("Bern", logistic(eta/5));
         else 
            MARK = .;   /* missing values ==> obs to be scored */
         output;
      end;
   end;
end;
run;

PROC GLIMMIX DATA=LOGIT PLOTS = ALL ;
CLASS C D;
MODEL MARK(EVENT='1') = A B C A*B B*C / LINK=LOGIT DIST=BINARY SOLUTION;
RANDOM INT / SUBJECT = D;
OUTPUT OUT=TRANSECT_GLIMMIX PRED(ILINK)=PREDPROB PRED(ILINK
NOBLUP)=FIX_PREDPROB;
STORE TEST_PLM;
RUN;


PROC PLM RESTORE=TEST_PLM;
show cov Parms;
SCORE DATA=TRANSECT_GLIMMIX OUT=SCORE_IT PREDICTED UCLM LCLM /
ILINK;
RUN;

You can add (WHERE(MARK=.)) to the DATA=OPTION on the SCORE statement to score only the observations for which MARK is missing.

Questionplease · Posted 10-03-2018 10:10 AM

Thanks for the quick response, and the simulation! Once the model is
developed I intended to use PLM to make predictions under multiple scenarios
where the term A is varied and then assess the difference in the exceedence
frequency for each scenario using a cutpoint value. So for instance, if the
cutpoint for identifying an exceedence is 0.50, then for row 239 the BLUP
value is greater than 0.50 while the NOBLUP value is less than 0.50. In my
case this makes a fairly substantial difference but now I am thinking that
this means I should probably specify D as a fixed effect even though the
overall objective is to evaluate the effects of A.

Prediction using proc glimmix and proc plm

Re: Prediction using proc glimmix and proc plm

Re: Prediction using proc glimmix and proc plm

Re: Prediction using proc glimmix and proc plm

Re: Prediction using proc glimmix and proc plm

Re: Prediction using proc glimmix and proc plm

Re: Prediction using proc glimmix and proc plm

Re: Prediction using proc glimmix and proc plm

Re: Prediction using proc glimmix and proc plm

Registration is open