Statistical Procedures

Programming the statistical procedures from SAS
BookmarkSubscribeRSS Feed
cbing
Calcite | Level 5

I have a very large data set that I am trying to run an OLS regression on. I have two IV's and an interaction term regressing on one DV. I know how to do a basic proc reg; the problem is that I also need a table of mean differences (I believe this may be the Diff in Diff equation). 

 

I have one IV that is binary, identifying two different "groups" within my data set, and I need to compare the mean coefficient differences between the groups for various years using an interaction term yearXgroup with a control variable.

 

IThe structure of the regression equation looks something like: DV = Sum(Year + Year*Group) + Sum IV3(Control)

I need to compare the coefficient mean differences for the two groups. I hope that makes sense.

 

I was unable to do the diff estimate in proc reg, so I was trying with proc genmod; this allowed me to get difference estimates but the coefficient mean differences don't seem right. I'm also confused what exactly proc genmod is doing, if it is an OLS regression or something else. Any help is greatly appreciated. My current code is:

 

proc genmod data=have;
class year group IV3;
model DV = year year*group IV3;       <------ I'm not sure this set-up is accurate per the equation above
estimate "Diff in Diff" year*group 1 -1 -1 1;
lsmeans year*group / diff;       <-------- This is the difference table I am in need of
lsmestimate year "Diff in Diff" 1 -1 -1 1;
run;
quit;

7 REPLIES 7
PaigeMiller
Diamond | Level 26

I need to compare the coefficient mean differences

 

Does this mean you want to compare the coefficient (slope) in group 1 with the coefficient (slope) in group 2? or does it mean something else? Using the words "coefficient" and "mean" together is not making sense to me.

--
Paige Miller
Rick_SAS
SAS Super FREQ

> I'm also confused what exactly proc genmod is doing, if it is an OLS regression or something else. 

PROC GENMOD is not an OLS routine. Like other regression procedures for generalized linear models, it uses maximum likelihood estimation.  Based on your questions and example, I suggest you use PROC GLM, which is an OLS procedure with many of the same statements:

 

PROC GLM supports the ESTIMATE and LSMEANS statements, so your code would look like this:

 

proc genmod data=have;
class year group IV3;
model DV = year year*group IV3; 
estimate "Diff in Diff" year*group 1 -1 -1 1;
lsmeans year*group / diff;      
run;
quit;

cbing
Calcite | Level 5

I don't believe this is giving me what I need. I'm doing a replication with not a lot of instruction. I need to produce the interaction term coefficients for the two groups for each year. I then need the "mean difference." I'm a little unclear of what that means. But I need the mean difference and standard error of the interaction term for each year. Something like this:

 

1990   3.2

           (.05)

 

1991    2.8

           (.031)

 

1992    1.19

            (.28)

 

Even if it doesn't look quite like this layout, I just need the information. I hope this makes sense. 

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

Like @PaigeMiller I am not completely sure what you want. But perhaps the SLiCE statement would work:

 

https://support.sas.com/documentation/cdl/en/statug/63962/HTML/default/viewer.htm#statug_genmod_sect...

 

something like (which I have not tested😞

 

 

proc genmod data=have;
class year group IV3;
model DV = year group year*group IV3 / solution; lsmeans year*group;slice year*group / sliceby=year diff cl;run;

 

Note that I have added "group" to the MODEL statement to maintain the interaction hierarchy principle, see

 

https://www.quora.com/Why-do-we-have-the-hierarchical-principle-in-adding-interactions-to-a-model-Wh...

 

The "coefficients" would be produced by the SOLUTION option on the MODEL statement; but keep in mind that the coefficients depend on the parameterization of the model and are not unique. In my opinion, the LSMEANS are more intuitive (and do not depend on model parameterization).

 

The SLICE option is also available for the LSMEANS statement in PROC GLM. The documentation tells you everything you need to know 🙂

 

cbing
Calcite | Level 5

I believe the attached image is exactly what I need, however, the replication I am doing had used OLS regression so I'm not sure if the GLIMMIX is acceptable. I will try the genmod code suggested, but I was told that is not the same as OLS.   

 

Screen Shot 2018-05-16 at 10.04.03 PM.png

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

If you are assuming a normal distribution, then the OLS estimators are the same as the ML estimators. Consequently, GLM, GENMOD, and GLIMMIX will produce the same results. (GENMOD and GLIMMIX are able to accommodate non-normal distributions, but that is apparently not your situation.)

 

Some variant of the SLICE option or statement is available in all of these procedures, and the SLICEDIFF option is similar. You can learn a lot by trying the different Proc approaches. I encourage you to spend the time to learn more about the process; blindly following someone else's template is dangerous if you don't know what you are doing and why. Documentation is your friend in this activity, and you can always ask questions on this forum.

StatDave
SAS Super FREQ

Also, see this note about estimating the difference in difference. Note that for an ordinary regression model (like from REG or GLM, or GENMOD or GLIMMIX with LINK=IDENTITY), the difference in difference is just the interaction parameter.

 

As has been said, if you want to compare the levels of one variable in an interaction at each level of the other variable in the interaction, then the SLICE statement (in GLM or GENMOD) or the SLICEDIFF option (as you noted in GLIMMIX) is the thing to use.

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 2395 views
  • 6 likes
  • 5 in conversation