I’m analyzing whether the revenue of a panel of firms is affected by state policy measures. Therefore, I’ve a 3-level model, because I’m looking at firms that are nested within states and the firms’ revenues vary over time.
I’m trying to account for two things: first, the firms’ yearly revenues are correlated across time. Second, state-policy measures vary on state level and therefore I’ve to account for clustering on this level (the firms revenues are correlated within the states).
I’m wondering whether this mixed-specification accounts for my hierarchical model. However, I’ve the feeling that my standard errors on the state level are somehow too small and therefore I want to make sure that I’ve correctly accounted for the clustering on state level with the following specification.
PROC MIXED DATA = analysis;
CLASS firm year state;
MODEL Revenue =
Firm-specific-variables
year
state-policy-measure
/noint SOLUTION CL;
RANDOM state;
REPEATED year /SUBJECT=firm(state) TYPE=AR(1);
RUN;
Would be great to get any feedback on this issue.
I'm going to only answer part of this, as I don't use Stata and anything I would say regarding differences in algorithms would be worse than gibberish. StackOverflow would be a better place maybe.
Anyhow, I'll take a swing at:
Regarding the suggestion of an interaction effect: I assume that most of the firm-specifics remain constant (controlled for with a fixed effect in the repeated statement), while some firm-specifics vary over time (it’s an 18 year panel). The state-policy-measure is a binary variable and varies over time. The firms are within states and do not change location. I’m not really sure why you would need an interaction effect. Where would you include one? StatePolicy*facility? facility*time? How would you interpret this effect?
Given this, I would try to fit:
PROC MIXED DATA = analysis;
CLASS firm year state;
MODEL Revenue =
Firm-specific-variables
year
state-policy-measure
state-policy-measure*year
/noint SOLUTION CL;
RANDOM state;
REPEATED year /SUBJECT=firm(state) TYPE=AR(1);
RUN;
as a first pass, to see if the dependency on year and state policy was anything but additive. Since state policy is a binary but continuous variable (not included in class statement), this looks to see if the correlation over time is consistent between the two policies.
I hope this makes sense.
Steve Denham
What happens when you ran it? From a conceptual point of view, it looks pretty good. I assume that the firm-specific and policy measures remain constant over time. If not, you will probably need to include interaction terms to correctly capture everything. Given all of that, it is then a matter of having enough data to fit the model, I would think.
Steve Denham
Dear Steve,
Thank you for your answer! Basically the model performs well. However, I was comparing different specifications and also tried to replicate the results in Stata. I’m getting inconsistent results and I’m wondering which results are correct.
I’m especially interested if I can trust the standard errors on the state level. After I’ve read the paper from Bertrand et al. on diff-in-diff estimates, I was wondering if I can interpret a significant result of the state-policy variable directly as “effective policy measure”. Does the given SAS specification fully control for intraclass correlation on state level (called clustering in Stata)?
I also tried to replicate the code in Stata and Stata reports far higher standard errors:
xtreg revenue facility-specific-variables i.year state-policy-measure, fe i(facility) cluster(state)
and
xtset facility year
xtregar revenue facility-specific-variables state-policy-measure year-dummy, fe
(followed by a jackknife procedure to adjust standard errors for clustering on state level)
Sure, those specifications are slightly different than the SAS-specification (I wasn’t able to replicate the procedure 1:1 in Stata). However, the standard errors are about the factor 100 higher and therefore most of my effects are changing to insignificance at the 5%-level.
I’ve 18 periods, 47 states and about 12,000 facilities in my sample. The Bertrand paper argues (more or less) that we have to exploit the variation on the state level and not the variation across all the facilities as they are correlated. In other words, if we assume a correlation of 1 between the facilities within a state, we have just 18 x 47 observations in the dataset. This makes the finding of significant results far less likely. Therefore, I’m wondering why SAS reports p-values for the state-policy variable of <.0001 while Stata reports around .05 (or higher).
Regarding the suggestion of an interaction effect: I assume that most of the firm-specifics remain constant (controlled for with a fixed effect in the repeated statement), while some firm-specifics vary over time (it’s an 18 year panel). The state-policy-measure is a binary variable and varies over time. The firms are within states and do not change location. I’m not really sure why you would need an interaction effect. Where would you include one? StatePolicy*facility? facility*time? How would you interpret this effect?
The dataset is unbalanced and has time gaps on the facility level.
Bertrand, Marianne, Esther Duflo, and Sendhil Mullainathan. How much should we trust differences-in-differences estimates?. No. w8841. National Bureau of Economic Research, 2002. http://www.nber.org/papers/w8841.pdf,
I'm going to only answer part of this, as I don't use Stata and anything I would say regarding differences in algorithms would be worse than gibberish. StackOverflow would be a better place maybe.
Anyhow, I'll take a swing at:
Regarding the suggestion of an interaction effect: I assume that most of the firm-specifics remain constant (controlled for with a fixed effect in the repeated statement), while some firm-specifics vary over time (it’s an 18 year panel). The state-policy-measure is a binary variable and varies over time. The firms are within states and do not change location. I’m not really sure why you would need an interaction effect. Where would you include one? StatePolicy*facility? facility*time? How would you interpret this effect?
Given this, I would try to fit:
PROC MIXED DATA = analysis;
CLASS firm year state;
MODEL Revenue =
Firm-specific-variables
year
state-policy-measure
state-policy-measure*year
/noint SOLUTION CL;
RANDOM state;
REPEATED year /SUBJECT=firm(state) TYPE=AR(1);
RUN;
as a first pass, to see if the dependency on year and state policy was anything but additive. Since state policy is a binary but continuous variable (not included in class statement), this looks to see if the correlation over time is consistent between the two policies.
I hope this makes sense.
Steve Denham
Dear Steve,
thanks for providing the details. It's a good idea with the interaction effect. I'll try.
If anybody else is interested in the clustering/standard error issue mentioned above: clustering, i.e., Huber-White corrected standard errors, are obtained by specifying the EMPIRICAL option in the PROC MIXED statement:
PROC MIXED DATA = analysis EMPIRICAL;
[...]
Thanks a lot, again!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.