- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have a very large data set that I am trying to run an OLS regression on. I have two IV's and an interaction term regressing on one DV. I know how to do a basic proc reg; the problem is that I also need a table of mean differences (I believe this may be the Diff in Diff equation).
I have one IV that is binary, identifying two different "groups" within my data set, and I need to compare the mean coefficient differences between the groups for various years using an interaction term yearXgroup with a control variable.
IThe structure of the regression equation looks something like: DV = Sum(Year + Year*Group) + Sum IV3(Control)
I need to compare the coefficient mean differences for the two groups. I hope that makes sense.
I was unable to do the diff estimate in proc reg, so I was trying with proc genmod; this allowed me to get difference estimates but the coefficient mean differences don't seem right. I'm also confused what exactly proc genmod is doing, if it is an OLS regression or something else. Any help is greatly appreciated. My current code is:
proc genmod data=have;
class year group IV3;
model DV = year year*group IV3; <------ I'm not sure this set-up is accurate per the equation above
estimate "Diff in Diff" year*group 1 -1 -1 1;
lsmeans year*group / diff; <-------- This is the difference table I am in need of
lsmestimate year "Diff in Diff" 1 -1 -1 1;
run;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I need to compare the coefficient mean differences
Does this mean you want to compare the coefficient (slope) in group 1 with the coefficient (slope) in group 2? or does it mean something else? Using the words "coefficient" and "mean" together is not making sense to me.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
> I'm also confused what exactly proc genmod is doing, if it is an OLS regression or something else.
PROC GENMOD is not an OLS routine. Like other regression procedures for generalized linear models, it uses maximum likelihood estimation. Based on your questions and example, I suggest you use PROC GLM, which is an OLS procedure with many of the same statements:
PROC GLM supports the ESTIMATE and LSMEANS statements, so your code would look like this:
proc genmod data=have;
class year group IV3;
model DV = year year*group IV3;
estimate "Diff in Diff" year*group 1 -1 -1 1;
lsmeans year*group / diff;
run;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I don't believe this is giving me what I need. I'm doing a replication with not a lot of instruction. I need to produce the interaction term coefficients for the two groups for each year. I then need the "mean difference." I'm a little unclear of what that means. But I need the mean difference and standard error of the interaction term for each year. Something like this:
1990 3.2
(.05)
1991 2.8
(.031)
1992 1.19
(.28)
Even if it doesn't look quite like this layout, I just need the information. I hope this makes sense.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Like @PaigeMiller I am not completely sure what you want. But perhaps the SLiCE statement would work:
something like (which I have not tested😞
proc genmod data=have;
class year group IV3;
model DV = year group year*group IV3 / solution; lsmeans year*group;slice year*group / sliceby=year diff cl;run;
Note that I have added "group" to the MODEL statement to maintain the interaction hierarchy principle, see
The "coefficients" would be produced by the SOLUTION option on the MODEL statement; but keep in mind that the coefficients depend on the parameterization of the model and are not unique. In my opinion, the LSMEANS are more intuitive (and do not depend on model parameterization).
The SLICE option is also available for the LSMEANS statement in PROC GLM. The documentation tells you everything you need to know 🙂
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I believe the attached image is exactly what I need, however, the replication I am doing had used OLS regression so I'm not sure if the GLIMMIX is acceptable. I will try the genmod code suggested, but I was told that is not the same as OLS.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If you are assuming a normal distribution, then the OLS estimators are the same as the ML estimators. Consequently, GLM, GENMOD, and GLIMMIX will produce the same results. (GENMOD and GLIMMIX are able to accommodate non-normal distributions, but that is apparently not your situation.)
Some variant of the SLICE option or statement is available in all of these procedures, and the SLICEDIFF option is similar. You can learn a lot by trying the different Proc approaches. I encourage you to spend the time to learn more about the process; blindly following someone else's template is dangerous if you don't know what you are doing and why. Documentation is your friend in this activity, and you can always ask questions on this forum.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Also, see this note about estimating the difference in difference. Note that for an ordinary regression model (like from REG or GLM, or GENMOD or GLIMMIX with LINK=IDENTITY), the difference in difference is just the interaction parameter.
As has been said, if you want to compare the levels of one variable in an interaction at each level of the other variable in the interaction, then the SLICE statement (in GLM or GENMOD) or the SLICEDIFF option (as you noted in GLIMMIX) is the thing to use.