BookmarkSubscribeRSS Feed
datalligence
Fluorite | Level 6
Hi,

I developed a model using Proc Mixed, to forecast Sales. Now, I have the intercepts, the fixed and random coefficients.

If I try to calculate Sales using the equation as:

Sales = Intercept + Fixed Coeff * Value + Random Coeff * Value

The Sales from the above equation and the Model output are different. Shouldn't they be exactly the same. What am I doing wrong here? Can someone please help?

Thanks.
14 REPLIES 14
Susan
Calcite | Level 5
The answer to your question depends on the specific model that you fit with MIXED. If you provide the code, some one might be able to suggest an answer.

Depending on what you are trying to do, you may not need to calculate predictions yourself. Instead (more easily and with less risk of calculation error) you can obtain conditional (BLUP) predicted values using the OUTP option on the MODEL statement in the MIXED procedure. See the documentation for details.

If you have SAS/STAT 9.22, you could use the PLM procedure to produce predictions for the observed data or for a new data set. For an introduction to PLM see

http://support.sas.com/resources/papers/proceedings10/258-2010.pdf

Hope this helps,
Susan
datalligence
Fluorite | Level 6
Thank you Susan. Below is the code:

PROC MIXED DATA = mylib.data;
model sales =
Var1
/solution cl
noint;
random
intercept
Var2
/type=un subject=Store_Number solution cl gcorr;
run;

I get forecasted sales in my output files. What I want to do is take out the equation and check if substituting values in the equation gives the same values of forecasted sales. It should, right? But it's not the same.
Susan
Calcite | Level 5
I have some questions/comments about your code.

1. The MODEL statement specifies Var1, but the RANDOM statement specifies Var2. Do you mean to have Var2 rather than Var1 in the RANDOM statement?

2. Do you mean to have the NOINT option in the MODEL statement? Generally, I think you would want a fixed-effect intercept which would be the estimated mean of the random intercepts for Store_Numbers.

****Using NOINT is not wrong, it's just a different parameterization (which affects the "hand-calculation" you are attempting).**** I'm retracting this; see my next post.

3. Store_Number is not in a CLASS statement, which is acceptable as long as the dataset is sorted appropriately.

If the model you use for "hand-calculation" with the parameter estimates matches the model you specify with the MIXED procedure, then you should get the same result (with perhaps a touch of rounding error).

Susan




Message was edited by: Susan ***I'm retracting this. See my next post.


Message was edited by: Susan
datalligence
Fluorite | Level 6
1. The MODEL statement specifies Var1, but the RANDOM statement specifies Var2. Do you mean to have Var2 rather than Var1 in the RANDOM statement?

-- Yes, I want Var2 to be random.

2. Do you mean to have the NOINT option in the MODEL statement? Generally, I think you would want a fixed-effect intercept which would be the estimated mean of the random intercepts for Store_Numbers. Using NOINT is not wrong, it's just a different parameterization (which affects the "hand-calculation" you are attempting).
-- I am using an intercept in the Random statement, as I want an intercept for each Store. The model fit gets worse if I use fixed-effect intercept.

3. Store_Number is not in a CLASS statement, which is acceptable as long as the dataset is sorted appropriately.
-- Data is sorted, so no issues here.

If the model you use for "hand-calculation" with the parameter estimates matches the model you specify with the MIXED procedure, then you should get the
same result (with perhaps a touch of rounding error).
-- I am using the same variables and the same betas from the model - but the forecasted sales using the equation and the model are still different. Hard to say if it's a rounding error - difference is in the range of 0.1 to 0.6 for sales per sqft.

How do I take care of the rounding error (if this is the cause)?

Thanks!
datalligence
Fluorite | Level 6
Sorry, forgot to add. It cannot be a rounding off error because I am using the betas and the input values, directly from the SAS Mixed outputs.
Susan
Calcite | Level 5
About 30 minutes after I posted my last message I thought, Wait a minute...maybe the presence or absence of NOINT *does* matter.

So this morning, I experimented with NOINT using the random coefficients model example from the MIXED documentation.

/* Example 56.5 in MIXED chapter */
data rc;
input Batch Month @@;
Monthc = Month;
do i = 1 to 6;
input Y @@;
output;
end;
datalines;
1 0 101.2 103.3 103.3 102.1 104.4 102.4
1 1 98.8 99.4 99.7 99.5 . .
1 3 98.4 99.0 97.3 99.8 . .
1 6 101.5 100.2 101.7 102.7 . .
1 9 96.3 97.2 97.2 96.3 . .
1 12 97.3 97.9 96.8 97.7 97.7 96.7
2 0 102.6 102.7 102.4 102.1 102.9 102.6
2 1 99.1 99.0 99.9 100.6 . .
2 3 105.7 103.3 103.4 104.0 . .
2 6 101.3 101.5 100.9 101.4 . .
2 9 94.1 96.5 97.2 95.6 . .
2 12 93.1 92.8 95.4 92.2 92.2 93.0
3 0 105.1 103.9 106.1 104.1 103.7 104.6
3 1 102.2 102.0 100.8 99.8 . .
3 3 101.2 101.8 100.8 102.6 . .
3 6 101.1 102.0 100.1 100.2 . .
3 9 100.9 99.5 102.2 100.8 . .
3 12 97.8 98.3 96.9 98.4 96.9 96.5
;
run;
/* Random coefficients model WITH fixed-effect intercept */
proc mixed data=rc;
class Batch;
model Y = Month / s;
random Int Month / type=un sub=Batch s;
run;
/* Random coefficients model WITHOUT fixed-effect intercept */
proc mixed data=rc;
class Batch;
model Y = Month / s noint;
random Int Month / type=un sub=Batch s;
run;

The results for intercept and slope estimates for BATCHs are different--not a lot for this example, but definitely different. The estimates of variances and the covariance are VERY different, and those for the model with NOINT do not look good. So I suggest that you add NOINT to your model statement.

The use of Var1 in the MODEL statement but not in the RANDOM statement, and the use of Var2 in the RANDOM statement but not in the MODEL statement is a puzzle to me. I think (but do not know for sure about (3) below) that this model implies (1) that the slope for the linear regression of SALES on VAR1 is the same for all STORE_NUMBERs, (2) that SALES does not vary systematically with VAR2, and (3) that there is something like a random "block" effect induced by VAR2, which would tend to affect the intercept and not the slope. If you use this model, then you should know for sure what your model is doing.

Is there a reason why you are not using a random coefficients model here, with

model sales = var1 var2;
random int var1 var2 / subject=store_number type=<>; ?

Susan
datalligence
Fluorite | Level 6
I am using Var1 as fixed because my assumption/finding is that it has the same influence on Sales across all Stores. Var2 has been defined as random effect because I know that it has a different impact on Sales for each store.

The intercept in the random statement will offset the line for each Store by a fixed amount. And using this makes my model more stable and accurate.

I would not like to use Var1 as ramdom effect as it has the same/uniform effect on all the Stores.

Besides all these, how is the result from using the model and using the same model's equation different? I am still at a loss as to why and how this is happening.
Susan
Calcite | Level 5
Perhaps it would be useful to explicitly think about this analysis as a regression.

You are regressing SALES on VAR1, and you are assuming that the slopes of the regression for STORE_NUMBERs are essentially the same with no random variability. Consequently, the model estimates a single, fixed-effect slope parameter.

I believe you also are interested in regressing SALES on VAR2, but you think that the slopes of the regression for STORE_NUMBERs are not constant and have some appreciable level of variability. Do you intend for your model to include this regression? If so, you must include VAR2 as a predictor in the MODEL statement. The presence of VAR2 in the RANDOM statement but not the MODEL statement fails to specify a regression.

As you say, the intercept in the RANDOM statement will allow different STORE_NUMBERs to have different intercepts. But again I recommend that you remove the NOINT option from the MODEL statement. The models with and without the NOINT option are obviously different--they are not re-parameterizations of the same model. And as I showed in the example in a previous message, the results for the model without the NOINT option appear much more reliable.

Given your model, an equation for "hand-calculating" the prediction for the jth observation on the ith STORE_NUMBER would be

sales_ij = RandomIntercept_i + FixedEffectSlopeForVar1*Var1_ij + RandomVar2_i

RandomVar2 is not multiplied by Var2 because your model fails to specify the regression of SALES on Var2. The reason the result from the model differs from that for your original hand-calculation equation is that your equation does not match your model.

HTH,
Susan
datalligence
Fluorite | Level 6
Hi Susan,

The results from model and hand calculation did not match even when I used the equation you mentioned:

sales_ij = RandomIntercept_i + FixedEffectSlopeForVar1*Var1_ij + RandomVar2_i

I am checking the results by removing NOINT and also including Var2 in the model statement. As of now, the fixed intercept is not coming out significant 🙂

Need to try out a new set of variables. Thanks again for all the help, really really appreciate!!!
Dale
Pyrite | Level 9
I don't think your model is what you believe it to be. It is fine to have Var1 as fixed and Var2 as random. But you appear to be operating under the assumption that variables specified on the RANDOM statement are assumed to have a distribution with nonzero mean. For a model written as

      Y{ij} = X{ij}*beta + gamma{i} + epsilon{ij}

where i indexes store and j indexes an observation within the i-th store, it appears that you believe that the random effect gamma{i} has distribution

      gamma{i} ~ N(mu, V)

That is, it appears that you believe that there is a nonzero expectation for gamma{i} that is implicit in the model. Because you believe that there is a nonzero expectation for gamma{i}, you have specified NOINT on the MODEL statement (assuming that the random intercept would have nonzero expectation) and have not specified Var2 on the MODEL statement.

That is NOT how the MIXED procedure operates. Effects specified on the RANDOM statement are assumed to have distribution

      gamma{i} ~ N(0, V)

To be even more explicit, the code which you have specified estimates the following model:

      Sales{ij} = b1*Var1{ij} + theta{i} + delta{i}*Var2{ij} + epsilon{ij}

      theta{i} ~ N(0, V{theta})
      delta{i} ~ N(0, V{delta})
      cov(theta{i},delta{i}) = rho*sqrt(V{theta}*V{delta})

If you remove the NOINT option from the MODEL statement and include both Var1 and Var2 on the MODEL statement, then you would fit what is almost certainly a better model (and definitely NOT A WRONG model) where you have

      theta{i} ~ N(mu{theta}, V{theta})
      delta{i} ~ N(mu{delta}, V{delta})
      cov(theta{i},delta{i}) = rho*sqrt(V{theta}*V{delta})

Now, as to your original question about the MIXED procedure returning values which are different from the values which you compute, I don't see anything in the code you supplied which indicates how you constructed predicted values either from the MIXED procedure or in a data step (using parameter estimates from the MIXED procedure). You don't show either an OUTP or OUTPM option on the MODEL statement. You also don't show any ODS OUPUT statements for capturing the fixed effect estimates (ODS OUTPUT solutionF=...) or for capturing the random effect estimates (ODS OUTPUT solutionR=...).

Without seeing exactly what you requested and exactly how you computed predicted values, one can only speculate on the reason for the differences. My guess is that you requested the marginal expectation (using OUTPM=) and then constructed conditional predicted values (computing b1*Var1 + theta + delta*Var2) OR you requested the conditional predicted values (using OUTP=) and then constructed marginal predicted values (computing b1*Var1). You have to be consistent about whether you do or do not include the random effects when constructing predicted values.

Finally, I would note that your PROC MIXED code did not name your subject effect (Store_number) on a CLASS statement. That is OK if your data are sorted by Store_number. But if your data are not sorted by Store_number, then you either need to first sort by Store_number or name Store_number on a CLASS statement. To address all of the issues which I have indicated above, I would revise your model as

PROC MIXED DATA = mylib.data;
  class store_number;
  model sales = Var1 Var2 / solution cl outp=Preds_RE outpm=Preds_FE;
  random intercept Var2 / type=un subject=Store_Number solution cl gcorr;
run;

Note that the data set Preds_RE represent predictions which include the random effects and is obtained as

      Yhat = b0 + b1*Var1 + b2*Var2 + theta + delta

while Preds_FE represent predictions which include only the fixed effects:

      Yhat = b0 + b1*Var1 + b2*Var2
Susan
Calcite | Level 5
Dale, thanks for the lovely detailed discussion.

Susan
datalligence
Fluorite | Level 6
Hi Dale,

I have now removed the NOINT option and included Var2 in the model statement. To get the model results I used:
OUTP=mixed_out;
ODS OUTPUT SolutionF=fixed;
ODS OUTPUT SolutionR=random;

Am trying the proc mixed codes you suggested, but the predicted sales from the model (Preds_RE) and the equation below do not match.
Yhat = b0 + b1*Var1 + b2*Var2 + theta + delta

Just to confirm:
b0 = fixed intercept
b1 = fixed beta for Var1
b2 = fixed beta for Var2
theta = random intercept
delta = random beta for Var2

Thank you for you help. Like the way you explained everything in detail.
Dale
Pyrite | Level 9
Whoops, that should have been

      Yhat = b0 + b1*Var1 + b2*Var2 + theta + delta*Var2

or

      Yhat = (b0 + theta) + b1*Var1 + (b2 + delta)*Var2


I left off Var2 at the end of the first equation (the product delta*Var2). Does this result in the same predicted values?
datalligence
Fluorite | Level 6
I should have seen that myself 🙂

Yup, the numbers are matching now. I also found the root cause - was hard coding the fixed effect betas. And once I increased the precision, the numbers matched. You were right Susan, it was a rounding off problem 🙂

Btw, the way you guys explained Proc Mixed is a lot better than what is there in the SAS Onlince doc! Thanks again.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 14 replies
  • 3483 views
  • 0 likes
  • 3 in conversation