04-15-2015 02:40 PM
I am working on a project with over 7000 employer groups during time frame of 2007-2013, and need to run a regression model which has expenditure as dependent variable, both employer group, calendar year and interaction between employer group and year as independent variables, among other independent variables. In the model, I need to treat year as categorical variable, but in year*group, year needs to be treated as continous variable. I know how to do it in Proc Reg, however, the problem is there are over 7000 employer groups, and it is impossible to create 7000 dummy variables in Proc Reg. Thus, I consider possibility to do it in Proc GLM, however, since I need to treat calendar year as categorical variable, I include it in Class statement, but this way, I don't know how to treat year as continuous variable in year*group anymore. I heard in Stata, it can be specified by adding a 'c' infront of the year to indicate treat it as continuous variable here, however, how to do it in SAS? Thank you!
04-15-2015 02:47 PM
That doesn't make a lot of sense in my head. Assuming it is statistically valid, can you create a new YEAR variable identical to the old one and put one in the Class statement and leave the other as continuous?
proc glm data=reg;
model expenditure = year year_cont*group;
04-15-2015 03:08 PM
Thank you so much! That's very smart.
Anohter question is, since each employer has mutiple years observations, so there is repeated measure correlation concern. In proc surveyreg, I can use statement "cluster employergroup" to remove the between-individual variation. Is there a way to do it in Proc GLM?
04-15-2015 03:45 PM
No, but there is always PROC MIXED or GLIMMIX for repeated measures designs. Consider stretching Reeza's glm model to the following, based on Example 44.15 Comparing Multiple B-Splines in the SAS/STAT13.2 documentation.:
proc glimmix data=yourdata;
class group year;
effect lin = polynomial(year_cont/degree=1); /*Rather than a spline, I suggest a linear polynomial so that you get a slope for each group*/
model y = group lin*group / s noint;
random year/residual type=ar(1) subject=group; /* This correlates within group*/
04-15-2015 04:39 PM
Thank you so much! Willl learn more about the PROC GLIMMIX and the statements.
My original thought was to use PROC MIXED to treat repeated measures. However, the investigator would like to include the employer group as a fixed effect, rather than random effect, which leads to the direction that, to avoid over 7000 dummy variables, we used mean deviation method to control the employer group effect and partial it out of the estimation equations by PROC REG or applying ABSORB statement in PROC GLM. I didn't figure out a way to conduct mean deviation in PROC MIXED. the only way I can think of to include employer group as fixed effect and to use PROC MIXED was to include it as fully/directly estimated fixed effect variable and generate over 7000 coefficient estimates.
But since the investigator now would like to add interaction variable group*year which will also inevitably generate over 7000 estimates, I guess I now can consider using PROC MIXED or GLIMMIX.
Btw, anyone knows what would be a SAS procudure that is similar to Stata xtreg?