turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- OLS regression with mean differences

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-15-2018 10:55 PM

I have a very large data set that I am trying to run an OLS regression on. I have two IV's and an interaction term regressing on one DV. I know how to do a basic proc reg; the problem is that I also need a table of mean differences (I believe this may be the Diff in Diff equation).

I have one IV that is binary, identifying two different "groups" within my data set, and I need to compare the mean coefficient differences between the groups for various years using an interaction term yearXgroup with a control variable.

IThe structure of the regression equation looks something like: DV = Sum(Year + Year*Group) + Sum IV3(Control)

I need to compare the coefficient mean differences for the two groups. I hope that makes sense.

I was unable to do the diff estimate in proc reg, so I was trying with proc genmod; this allowed me to get difference estimates but the coefficient mean differences don't seem right. I'm also confused what exactly proc genmod is doing, if it is an OLS regression or something else. Any help is greatly appreciated. My current code is:

proc genmod data=have;

class year group IV3;

model DV = year year*group IV3; <------ I'm not sure this set-up is accurate per the equation above

estimate "Diff in Diff" year*group 1 -1 -1 1;

lsmeans year*group / diff; <-------- This is the difference table I am in need of

lsmestimate year "Diff in Diff" 1 -1 -1 1;

run;

quit;

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to cbing

05-16-2018 08:24 AM

I need to compare the coefficient mean differences

Does this mean you want to compare the coefficient (slope) in group 1 with the coefficient (slope) in group 2? or does it mean something else? Using the words "coefficient" and "mean" together is not making sense to me.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to cbing

05-16-2018 09:18 AM

*> I'm also confused what exactly proc genmod is doing, if it is an OLS regression or something else. *

PROC GENMOD is not an OLS routine. Like other regression procedures for generalized linear models, it uses maximum likelihood estimation. Based on your questions and example, I suggest you use PROC GLM, which is an OLS procedure with many of the same statements:

PROC GLM supports the ESTIMATE and LSMEANS statements, so your code would look like this:

proc genmod data=have;

class year group IV3;

model DV = year year*group IV3;

estimate "Diff in Diff" year*group 1 -1 -1 1;

lsmeans year*group / diff;

run;

quit;

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Rick_SAS

05-16-2018 08:46 PM

I don't believe this is giving me what I need. I'm doing a replication with not a lot of instruction. I need to produce the interaction term coefficients for the two groups for each year. I then need the "mean difference." I'm a little unclear of what that means. But I need the mean difference and standard error of the interaction term for each year. Something like this:

1990 3.2

(.05)

1991 2.8

(.031)

1992 1.19

(.28)

Even if it doesn't look quite like this layout, I just need the information. I hope this makes sense.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to cbing

05-16-2018 10:03 PM

Like @PaigeMiller I am not completely sure what you want. But perhaps the SLiCE statement would work:

something like (which I have *not* *tested*):

```
proc genmod data=have;
class year group IV3;
model DV = year group year*group IV3 / solution;
```

lsmeans year*group;

slice year*group / sliceby=year diff cl;

run;

Note that I have added "group" to the MODEL statement to maintain the interaction hierarchy principle, see

The "coefficients" would be produced by the SOLUTION option on the MODEL statement; but keep in mind that the coefficients depend on the parameterization of the model and are not unique. In my opinion, the LSMEANS are more intuitive (and do not depend on model parameterization).

The SLICE option is also available for the LSMEANS statement in PROC GLM. The documentation tells you everything you need to know

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-16-2018 10:06 PM

I believe the attached image is exactly what I need, however, the replication I am doing had used OLS regression so I'm not sure if the GLIMMIX is acceptable. I will try the genmod code suggested, but I was told that is not the same as OLS.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to cbing

05-16-2018 10:23 PM

If you are assuming a normal distribution, then the OLS estimators are the same as the ML estimators. Consequently, GLM, GENMOD, and GLIMMIX will produce the same results. (GENMOD and GLIMMIX are able to accommodate non-normal distributions, but that is apparently not your situation.)

Some variant of the SLICE option or statement is available in all of these procedures, and the SLICEDIFF option is similar. You can learn a lot by trying the different Proc approaches. I encourage you to spend the time to learn more about the process; blindly following someone else's template is dangerous if you don't know what you are doing and why. Documentation is your friend in this activity, and you can always ask questions on this forum.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to cbing

05-17-2018 08:38 AM

Also, see this note about estimating the difference in difference. Note that for an ordinary regression model (like from REG or GLM, or GENMOD or GLIMMIX with LINK=IDENTITY), the difference in difference is just the interaction parameter.

As has been said, if you want to compare the levels of one variable in an interaction at each level of the other variable in the interaction, then the SLICE statement (in GLM or GENMOD) or the SLICEDIFF option (as you noted in GLIMMIX) is the thing to use.