Solved: Proc glimmix outputting predicted probabilities with missing dependent...

Caetreviop543 · Posted 06-11-2020 05:58 PM

I have a dataset that looks like this:

out time intervention control random_var
0  1  0  1  1
1  2  0  0  2
0  3  1  1  3
1  4  1  0  4
0  5  0  1  5
1  6  0  0  6
0  1  1  1  7
1  2  1  0  8
0  3  1  0  9
1  4  0  1  10
0  5  1  1  11
1  6  0  1  12

I would like to obtain the predicted probabilities and 95% CIs for dummy values which I appended to the end of the dataset. These dummy values are intended to show how the probability of the outcome changes over time within each intervention at the reference level for the control variable and random effect:

out time intervention control random_var
0  1  0  1  1
1  2  0  0  2
0  3  1  1  3
1  4  1  0  4
0  5  0  1  5
1  6  0  0  6
0  1  1  1  7
1  2  1  0  8
0  3  1  0  9
1  4  0  1  10
0  5  1  1  11
1  6  0  1  12
.  1  0  0  1
.  2  0  0  1
.  3  0  0  1
.  4  0  0  1
.  5  0  0  1
.  6  0  0  1
.  1  1  0  1
.  2  1  0  1
.  3  1  0  1
.  4  1  0  1
.  5  1  0  1
.  6  1  0  1

Obviously they don't have an outcome, and aren't used to estimate the coefficients. Here is the model:

proc glimmix method=laplace;
class random intervention (ref='0') control (ref='0');
model out (event='1')=intervention time control intervention*time intervention*control time*control/solution link=logit dist=binary;
random random_var;
output out=predprob predicted(blup ilink)=pp;
run;

With proc logistic, this isn't a problem. But it seems that proc glimmix won't output the predicted probabilities unless there is an outcome variable, for some reason. Is there a way to work around this?

ballardw · Posted 06-11-2020 06:11 PM

You may want to check the log carefully for messages related to this.

From the documentation of the OUTPUT statement in Glimmix:

If a particular combination of keyword and keyword options is not supported, the statistic is not computed and a message is produced in the SAS log.

The online documentation for my install does not show BLUP as an option with Predicted.

I think, don't use Glimmix, that you use the STORE statement to save the elements of the model that Proc PLM uses to score data, which would be your values with the missing dependent variable.

View solution in original post

ballardw · Posted 06-11-2020 06:11 PM

You may want to check the log carefully for messages related to this.

From the documentation of the OUTPUT statement in Glimmix:

If a particular combination of keyword and keyword options is not supported, the statistic is not computed and a message is produced in the SAS log.

The online documentation for my install does not show BLUP as an option with Predicted.

I think, don't use Glimmix, that you use the STORE statement to save the elements of the model that Proc PLM uses to score data, which would be your values with the missing dependent variable.

SteveDenham · Posted 06-12-2020 07:25 AM

This may be difficult. The appended values will not have a BLUP associated with them, as that is dependent on the data. I would suggest trying the output statement with the NOBLUP option, which would give the marginal probabilities for these observations, conditional on the fixed effects. I believe that is as good as you might be able to do, even implementing @ballardw 's excellent suggestion of using the combination of the STORE command and following with PROC PLM. There just is no way to compute a BLUP for an observation that has no data, so far as I can tell.

SteveDenham

Caetreviop543 · Posted 06-12-2020 01:52 PM

I don't want to use noblup because the predicted probabilities should include the random intercept. The store procedure and proc plm worked for me. Thanks!

SteveDenham · Posted 06-15-2020 08:56 AM

I want to ask a favor, and that you humor me. Try running the predicted statement in GLIMMIX with the following options:

output out=predprob predicted(ilink)=pp;

Does this data set get created? If so, how does it differ from the output from PROC PLM?

SteveDenham

Caetreviop543 · Posted 06-15-2020 01:26 PM

Yes, it does. And it produces the same predicted probabilities as

output out=predprob predicted(blup ilink)=pp;

The predicted probabilities differ from proc plm, because as I understand, proc plm doesn't use random effects, whereas the output statement does.

SteveDenham · Posted 06-15-2020 02:17 PM

So, the values are all missing in the output data set. PLM and SCORE do not accommodate random effects.

In any case, I get predicted values when the response variable is missing using this code:

data one;
input ID	Right	Left 	Score	Group	Timepoint;
cards;
1001	0	0	2	1	 84
1002	1	1	4	1	 84
1003	0	0	2	1	 84
1004	0	1	3	1	 84
1005	1	1	4	1	 84
1006	1	0	3	1	 84
1008	1	1	4	1	 84
1009	0	0	2	1	 84
1110	0	0	2	1	 84
1011	0	0	2	1	 56
1012	0	1	3	1	 56
1013	0	1	3	1	 56
1114	0	1	3	1	 56
1015	0	0	2	1	 56
1016	1	0	2	1	 56
1017	1	1	4	1	 56
1018	1	0	3	1	 56
1019	1	0	3	1	 56
1021	0	0	2	1	 28
1022	1	0	3	1	 28
1023	0	0	2	1	 28
1024	0	0	2	1	 28
1025	0	0	2	1	 28
1026	0	0	2	1	 28
1027	0	0	2	1	 28
1028	0	0	2	1	 28
1029	0	0	2	1	 28
1030	0	0	2	1	 28
2001	0	0	2	2	 84
2002	0	0	2	2	 84
2003	0	1	3	2	 84
2004	1	0	3	2	 84
2005	1	0	3	2	 84
2006	1	1	4	2	 84
2007	1	1	4	2	 84
2008	1	1	4	2	 84
2012	1	1	4	2	 84
2013	1	1	4	2	 84
2009	1	1	4	2	 56
2010	0	0	2	2	 56
2011	1	1	4	2	 56
2014	0	1	3	2	 56
2015	0	0	2	2	 56
2016	0	1	3	2	 56
2017	1	1	4	2	 56
2018	1	1	4	2	 56
2019	0	1	3	2	 56
2020	1	0	3	2	 56
2021	0	0	2	2	 28
2022	0	0	2	2	 28
2023	0	0	2	2	 28
2024	0	0	2	2	 28
2025	0	0	2	2	 28
2026	0	0	2	2	 28
2027	0	0	2	2	 28
2028	1	1	4	2	 28
2029	0	0	2	2	 28
2030	1	1	4	2	 28
3001	1	1	4	3	 84
3002	1	1	4	3	 84
3003	1	1	4	3	 84
3004	1	0	3	3	 84
3005	1	1	4	3	 84
3106	0	1	3	3	 84
3007	1	1	4	3	 84
3008	1	1	4	3	 84
3014	0	0	2	3	 84
3015	1	0	3	3	 84
3209	1	0	3	3	 56
3010	1	1	4	3	 56
3011	0	1	3	3	 56
3012	1	0	3	3	 56
3013	1	1	4	3	 56
3016	1	1	4	3	 56
3017	0	0	2	3	 56
3018	1	0	3	3	 56
3019	1	0	3	3	 56
3020	0	0	2	3	 56
3021	0	0	2	3	 28
3022	0	0	2	3	 28
3023	0	0	2	3	 28
3024	0	0	2	3	 28
3025	0	0	2	3	 28
3026	0	1	3	3	 28
3027	0	1	3	3	 28
3029	0	0	2	3	 28
3030	0	0	2	3	 28
;

data onelong;
set one;
site=1;value=Right;output;
site=2;value=Left;output;
run;

proc sort data=onelong;
by timepoint id site;
run;

proc glimmix data=onelong method=laplace;
by timepoint;
class id group site;
model value = group site group*site/dist=bin ddfm=bw;
random site/subject=id;
output out=prednomiss pred(ilink)=pp;
run;

data two;
set onelong;
if id=1021 and timepoint=28 and site=1 then value=.;
run;

proc glimmix data=two method=laplace;
by timepoint;
class id group site;
model value = group site group*site/dist=bin ddfm=bw;
random site/subject=id;
output out=predmiss pred(ilink)=pp;
run;

In ds prednomiss, the values for subject 1021 are: value = 0, mu = 3.5754973E-6.

In ds predmiss, the values for subject 1021 are value = . , mu = 4.1053435E-6.

So, it looks like I can get values when the missings are in subjects already in the dataset. To test the next stage, I appended 2 subjects at day 28

data three;
input ID	Right	Left 	Score	Group	Timepoint;
cards;
1 0 0 . 1 28
2 1 1 . 1 28
;
run;

data four;
set one three;
run;

proc sort data=four out=five;
by ID timepoint group;
run;

data fivelong;
set five;
site=1;value=Right;output;
site=2;value=Left;output;
run;
proc sort data=fivelong;
by timepoint id site;
run;
proc glimmix data=fivelong method=laplace;
by timepoint;
class id group site;
model value = group site group*site/dist=bin ddfm=bw;
random site/subject=id;
output out=predmisssubj pred(ilink)=pp;
run;

I get predicted probabilities for subjects 1 and 2. Therefore, I believe you should be getting predicted probabilities for your appended data. I am mystified that you aren't, unless the appended data is also missing values for the independent variables.

SteveDenham

SteveDenham · Posted 06-15-2020 02:33 PM

And when I run your code on the given data, after correcting the CLASS statement to random_var, I get predicted values for all the cases with a missing response value. So I must be missing the point of your question. I think you have correctly approached this in your original work, and with a correction for a typo, you should be getting predicted probabilities for all your non-modeled cases.

SteveDenham

Caetreviop543 · Posted 06-16-2020 11:58 AM

I figured out the issue. Using this statement produces predicted probabilities for the output statement:

random site/subject=id;

However, including 'intercept' does not:

random intercept site/subject=id;

Does anyone know why? Also, I only need the random intercept, not slope. Does excluding intercept in the random statement also provide random slopes?

sld · Posted 06-18-2020 01:33 AM

I'm missing how @SteveDenham 's example dataset (which I think is just meant to illustrate a coding point) corresponds to the structure of your dataset and hence your original question.

I'm thinking we would all benefit from knowing more about your experimental design. What are the random effects factors, what are the fixed effects factors, and to which random effects factors are the fixed effects factors assigned?

SteveDenham · Posted 06-18-2020 09:29 AM

@sld - The example datasets were generated to show @Caetreviop543 that missing response variables could be predicted in GLIMMIX. I used data I had at hand that was complete to get working code. The first analysis was to just convert a single within subject value to missing. The second was to see what happened in the case where both values for a subject were missing. As @Caetreviop543 points out, the key is a usable RANDOM statement. For the original data, I think the code would look like:

proc glimmix method=laplace;
class random_var intervention (ref='0') control (ref='0');
model out (event='1')=intervention time control intervention*time intervention*control time*control/solution link=logit dist=binary ddfm=bw;
random intercept/subject=random_var;
*random time/subject=random_var type=ar(1);
output out=predprob predicted(ilink)=pp;
run;

Note that F values for terms involving control are infinity, as there is complete separation in the (probably incomplete) data given. In any case, the predicted probabilities differ by time. I commented out a term modeling the repeated nature of the data, as the variable time and random_var are confounded for the complete observations that were presented. This could be added back in for the actual data, and modified if the timepoints are not equally spaced.

SteveDenham

.

SteveDenham · Posted 06-18-2020 09:04 AM

The best way to explain this is to see what the statement would look like if you were not using by subject processing

The first is equivalent to

RANDOM site*id;

while the second it equivalent to:

RANDOM id site^id;

Now comes the honest admission - I don't know why one would give what you want and the other would not, and I don't really want to speculate.

But what happens if you use:

RANDOM intercept/subject=site*id:

do you still get predicted values for the missing response variable cases?

Perhaps that would give us some ideas.

SteveDenham

Proc glimmix outputting predicted probabilities with missing dependent variable

Re: Proc glimmix outputting predicted probabilities with missing dependent variable

Re: Proc glimmix outputting predicted probabilities with missing dependent variable

Re: Proc glimmix outputting predicted probabilities with missing dependent variable

Re: Proc glimmix outputting predicted probabilities with missing dependent variable

Re: Proc glimmix outputting predicted probabilities with missing dependent variable

Re: Proc glimmix outputting predicted probabilities with missing dependent variable

Re: Proc glimmix outputting predicted probabilities with missing dependent variable

Re: Proc glimmix outputting predicted probabilities with missing dependent variable

Re: Proc glimmix outputting predicted probabilities with missing dependent variable

Re: Proc glimmix outputting predicted probabilities with missing dependent variable

Re: Proc glimmix outputting predicted probabilities with missing dependent variable

Re: Proc glimmix outputting predicted probabilities with missing dependent variable

Ready to join fellow brilliant minds for the SAS Hackathon?