Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- OLS regression with mean differences

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 05-15-2018 10:55 PM
(2046 views)

I have a very large data set that I am trying to run an OLS regression on. I have two IV's and an interaction term regressing on one DV. I know how to do a basic proc reg; the problem is that I also need a table of mean differences (I believe this may be the Diff in Diff equation).

I have one IV that is binary, identifying two different "groups" within my data set, and I need to compare the mean coefficient differences between the groups for various years using an interaction term yearXgroup with a control variable.

IThe structure of the regression equation looks something like: DV = Sum(Year + Year*Group) + Sum IV3(Control)

I need to compare the coefficient mean differences for the two groups. I hope that makes sense.

I was unable to do the diff estimate in proc reg, so I was trying with proc genmod; this allowed me to get difference estimates but the coefficient mean differences don't seem right. I'm also confused what exactly proc genmod is doing, if it is an OLS regression or something else. Any help is greatly appreciated. My current code is:

proc genmod data=have;

class year group IV3;

model DV = year year*group IV3; <------ I'm not sure this set-up is accurate per the equation above

estimate "Diff in Diff" year*group 1 -1 -1 1;

lsmeans year*group / diff; <-------- This is the difference table I am in need of

lsmestimate year "Diff in Diff" 1 -1 -1 1;

run;

quit;

7 REPLIES 7

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I need to compare the coefficient mean differences

Does this mean you want to compare the coefficient (slope) in group 1 with the coefficient (slope) in group 2? or does it mean something else? Using the words "coefficient" and "mean" together is not making sense to me.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

*> I'm also confused what exactly proc genmod is doing, if it is an OLS regression or something else. *

PROC GENMOD is not an OLS routine. Like other regression procedures for generalized linear models, it uses maximum likelihood estimation. Based on your questions and example, I suggest you use PROC GLM, which is an OLS procedure with many of the same statements:

PROC GLM supports the ESTIMATE and LSMEANS statements, so your code would look like this:

proc genmod data=have;

class year group IV3;

model DV = year year*group IV3;

estimate "Diff in Diff" year*group 1 -1 -1 1;

lsmeans year*group / diff;

run;

quit;

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I don't believe this is giving me what I need. I'm doing a replication with not a lot of instruction. I need to produce the interaction term coefficients for the two groups for each year. I then need the "mean difference." I'm a little unclear of what that means. But I need the mean difference and standard error of the interaction term for each year. Something like this:

1990 3.2

(.05)

1991 2.8

(.031)

1992 1.19

(.28)

Even if it doesn't look quite like this layout, I just need the information. I hope this makes sense.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Like @PaigeMiller I am not completely sure what you want. But perhaps the SLiCE statement would work:

something like (which I have *not* *tested*😞

```
proc genmod data=have;
class year group IV3;
model DV = year group year*group IV3 / solution;
```

lsmeans year*group;

slice year*group / sliceby=year diff cl;

run;

Note that I have added "group" to the MODEL statement to maintain the interaction hierarchy principle, see

The "coefficients" would be produced by the SOLUTION option on the MODEL statement; but keep in mind that the coefficients depend on the parameterization of the model and are not unique. In my opinion, the LSMEANS are more intuitive (and do not depend on model parameterization).

The SLICE option is also available for the LSMEANS statement in PROC GLM. The documentation tells you everything you need to know 🙂

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

If you are assuming a normal distribution, then the OLS estimators are the same as the ML estimators. Consequently, GLM, GENMOD, and GLIMMIX will produce the same results. (GENMOD and GLIMMIX are able to accommodate non-normal distributions, but that is apparently not your situation.)

Some variant of the SLICE option or statement is available in all of these procedures, and the SLICEDIFF option is similar. You can learn a lot by trying the different Proc approaches. I encourage you to spend the time to learn more about the process; blindly following someone else's template is dangerous if you don't know what you are doing and why. Documentation is your friend in this activity, and you can always ask questions on this forum.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Also, see this note about estimating the difference in difference. Note that for an ordinary regression model (like from REG or GLM, or GENMOD or GLIMMIX with LINK=IDENTITY), the difference in difference is just the interaction parameter.

As has been said, if you want to compare the levels of one variable in an interaction at each level of the other variable in the interaction, then the SLICE statement (in GLM or GENMOD) or the SLICEDIFF option (as you noted in GLIMMIX) is the thing to use.

Are you ready for the spotlight? We're accepting content ideas for **SAS Innovate 2025** to be held May 6-9 in Orlando, FL. The call is **open **until September 16. Read more here about **why** you should contribute and **what is in it** for you!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.