BookmarkSubscribeRSS Feed
JeremyPoling
Calcite | Level 5

I have a two-factor crossed design where one factor is fixed and the other factor is random.  There is also a covariate in the model.  The model assumes equal slopes at each level of the random factor but not equal intercepts.  Below is an analogous example using SASHELP.PRDSALE where COUNTRY is treated as the random factor, PRODUCT is the fixed factor, and PREDICT is the covariate.  (Please forgive me if there are any typos, I don't have SAS on this computer.)

PROC MIXED DATA=SASHELP.PRDSALE;

     CLASS PRODUCT COUNTRY;

     MODEL ACTUAL=PRODUCT PRODUCT*PREDICT/NOINT;

     RANDOM COUNTRY PRODUCT*COUNTRY;

RUN;

For specific levels of the fixed factor and covariate, I need to be able to construct a 95% prediction interval for individual observations in the broad inference space.  For example, using the above analogy, suppose I want to construct a prediction interval for the BED product when the value of PREDICT is $300.  I can use an ESTIMATE statement to get a predicted value and the estimated variance of that predicted value (i.e. the variance of the estimated mean of the BED product at PREDICT=300) as follows:

ESTIMATE 'BED AT $300' PRODUCT 1 0 0 0 0 PRODUCT*PREDICT 300 0 0 0 0;

I think that I can then estimate the variance of an individual observation as Var(Predicted Value) + Var(COUNTRY) + Var(PRODUCT*COUNTRY) + Var(Residual).  The problem is I'm not sure how to use this variance to obtain a prediction interval for individual observations.  If this is a form of t-interval, how do you compute the degrees of freedom?

Any help that you can provide would be much appreciated.  Thanks!

Jeremy

7 REPLIES 7
SteveDenham
Jade | Level 19

To stay with this example, I guess I don't have a solid grasp on what you are asking for.  That is, an individual BED would have to be sold in some country, so it just becomes a matter of writing out the full BLUP version of the estimate statement.  The broad inference space mean/estimate/predicted value would not include any variability from the country and product*country variance components.

You may want to include the OUTPRED= option in your model statement, and check if the values you get there are what you need.

Steve Denham

JeremyPoling
Calcite | Level 5

Hi Steve.  Thanks for your reply.

I don't want to specify the level of COUNTRY.  I want a 95% prediction interval for individual observations for the BED product at PREDICT=300 assuming that the observation comes from a randomly selected COUNTRY.  Therefore, the variance of the individual observation would include variation from COUNTRY and PRODUCT*COUNTRY.  The ESTIMATE statement and OUTP= options provide confidence intervals for the predicted value (the mean response), not individual observations. 

Thanks again.

Jeremy

SteveDenham
Jade | Level 19

Sorry to disagree, but if you don't specify the level of COUNTRY, the value of BED and its associated variance is the marginal over all values of COUNTRY, thus I don't believe that the variance of the observation would include any contributions from COUNTRY or PRODUCT*COUNTRY (assuming that BED has Gaussian error).  Check out Ch. 6 (Best Linear Unbiased Prediction) in SAS for Mixed Models, 2nd. ed. by Littell et al. for examples.

Steve Denham

JeremyPoling
Calcite | Level 5

Thanks again Steve.

I still don't understand how the variance of an observation could not include contributions from COUNTRY or PRODUCT*COUNTRY and I'm still not sure how to obtain the interval that I need.

In hopes that it will help illustrate what I'm trying to accomplish, I'm going to attempt to break down the example a little further.  First of all, for illustration, I think that it might be easier to understand if the variance component for COUNTRY were a little larger.  Therefore, I tweaked the PRDSALE data set as follows and the executed the same model on the resultant data set:

DATA PRDSALE;

     SET SASHELP.PRDSALE;

     IF COUNTRY='CANADA' THEN ACTUAL=ACTUAL+500;

RUN;

PROC MIXED DATA=PRDSALE;

     CLASS PRODUCT COUNTRY;

     MODEL ACTUAL=PRODUCT PRODUCT*PREDICT/NOINT;

     RANDOM COUNTRY PRODUCT*COUNTRY;

     ESTIMATE 'BED AT $300' PRODUCT 1 0 0 0 0 PRODUCT*PREDICT 300 0 0 0 0;

RUN;

From the resultant covariance parameter estimates, the estimated variance of an observation is Var(ACTUAL)=Var(COUNTRY)+Var(PRODUCT*COUNTRY)+Var(Residual)=86924+217+82470=171411 and the estimated standard deviation is 414.  Therefore, if the fixed effect parameters (the slope and intercept for the BED product) were known without error, I think I could compute a prediction interval for an individual observation using this standard deviation and a critical value from the t-distribution with d.f. determined by the Satterthwaite method.  Of course, the problem is that this does not include variation in the fixed effect parameter estimates and  I don't really know the mean of BED at PREDICT=300.  According to the output from the ESTIMATE statement, we estimate this value to be 624.89 with a standard error of 173.41.  My initial thought was that this estimated standard error did not include any contribution from COUNTRY; after all, the estimated variance of COUNTRY is much larger than the variance of the estimate.  However, I now realize that this is not completely true since, as Steve said, the value of Bed and its variance is marginal over all values of COUNTRY. When I posted the initial message in this thread I thought that the variance of the error in a prediction could be expressed as Var(Prediction Error) = Var(Predicted Value) + Var(Actual) = 173.41**2 + 171411 = 201482.  This approach would be similar to the method used to calculate the variance for a traditional prediction interval used in regression application.  Now, I'm just more confused than ever and don't really know what to try next.

Just a thought, would it make more sense to use the standard error of the estimate if the narrow inference space was used instead?

ESTIMATE 'BED AT 300 NARROW' PRODUCT 15 PRODUCT*PREDICT 4500 |

     COUNTRY 5 5 5 PRODUCT*COUNTRY 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1/DIVISOR=15;

This ESTIMATE statement produced the same predicted value of ACTUAL but the standard error is 32.9 instead of 173.4.  It seems to me that this would certainly not include any contributions from COUNTRY or PRODUCT*COUNTRY.

Does what I'm trying to accomplish make sense?  If anyone have any suggestions, better methods, or code examples it would be very much appreciated.

Thanks again!

SteveDenham
Jade | Level 19

Mea culpa, mea maxima culpa.

Jeremy, I am getting old and soft and dumber by the day.  Of course, a broad inference space standard error will be larger, because it includes the additional sources of variation, just as you indicated.  My earler statement was wrong, wrong, wrong.

So, the prediction interval in the broad space is the same as that for the estimable function.  The best way to get this is the outpred= option (I think), since that gives the individual predicted value as opposed to outpredm= which gives the predicted value of the mean.  The estimate should be identical, but the standard error different.  The use of the ESTIMATE statement, and its associated error corresponds to a predicted mean value.  The alternative that I think of is to include an additional record in the input data, with all of the dependent variables you wish to use in the prediction, but with the dependent variable set to missing.  I think that should give it to you, anyway.

Once I again I apologize for turning this upside down.

Steve Denham

JeremyPoling
Calcite | Level 5

I don't think that the OUTPRED= option will do the trick.  Here is what I tried:

DATA PRDSALE;

     SET SASHELP.PRDSALE END=EOF;

     IF COUNTRY='CANADA' THEN ACTUAL=ACTUAL+500;

     OUTPUT;

     IF EOF THEN DO;

          CALL MISSING(ACTUAL, COUNTRY);

          PREDICT=300;

          PRODUCT='BED';

          OUTPUT;

     END;

RUN;

PROC MIXED DATA=PRDSALE;

     CLASS PRODUCT COUNTRY;

     MODEL ACTUAL=PRODUCT PRODUCT*PREDICT/NOINT OUTPRED=PREDICTIONS;

     RANDOM COUNTRY PRODUCT*COUNTRY;

RUN;

When I run the above code, the last observation of the data set PREDICTIONS has missing values for the PRED variable and the confidence limits.  I think that you need to specify values for all of the CLASS variables to get predicted values, even the RANDOM ones.

Am I doing something wrong?  Thanks again!

Jeremy

SteveDenham
Jade | Level 19

You aren't doing anything wrong.  Somebody around here has been giving you, if not bad advice, at least incomplete advice. Smiley Wink

Change OUTPRED to OUTPREDM, and include COUNTRY for the missing obs.  OUTPRED gives a BLUP (narrow inference space), while OUTPREDM gives the marginal.  And this turns out to be exactly what you had in the original ESTIMATE statement.

Try this:

DATA PRDSALE;

     SET SASHELP.PRDSALE END=EOF;

     IF COUNTRY='CANADA' THEN ACTUAL=ACTUAL+500;

     OUTPUT;

     IF EOF THEN DO;

          COUNTRY = 'U.S.A.';

    ACTUAL=.;

          PREDICT=300;

          PRODUCT='BED';

          OUTPUT;

     END;

RUN;

PROC MIXED DATA=PRDSALE;

     CLASS PRODUCT COUNTRY;

     MODEL ACTUAL=PRODUCT PRODUCT*PREDICT/NOINT solution outp=predictions outpm=mpredictions;

     RANDOM COUNTRY PRODUCT*COUNTRY;

  estimate "predicted value" product 1 0 0 0 0 product*predict 300 0 0 0 0/cl;

RUN;

If mpredictions has what you are looking for in terms of std err and df, then you should be on your way to actually answering the question where this all started.

Steve Denham

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 2807 views
  • 0 likes
  • 2 in conversation