Solved: Re: proc panel vs. proc glimmix

BK2 · Posted 07-17-2016 09:50 PM

Hello,

I am new to SAS and I am having trouble figuring out the difference between proc glimmix and proc panel in the context of panel data analysis. I have an unbalanced panel (CUSTOMER_ID + TIME_PERIOD) and I want to capture unobserved heterogeneity at customer-level, using a random effects model.

PROC GLIMMIX: I am specifcying a VC type covariance matrix between different customers. That is, I expect observations from the same customer to be correlated across time periods but not across different customers. All customers in addition have an idiosyncratic error component.

PROC GLIMMIX DATA=BK.DATA1;
CLASS VAR1 CUSTOMER_ID;
MODEL Y = VAR1 VAR2 VAR3 VAR1*VAR3
/LINK = IDENTITY DIST = NORMAL SOLUTION;
RANDOM INTERCEPT/ SUBJECT = CUSTOMER_ID TYPE= VC V;
RUN;

PROC PANEL: For the same covariates as in the above model, I am running in to trouble. For example, I do not get the Hausman test result and multicollinearity problems.

PROC PANEL DATA=BK.DATA1;
CLASS VAR1 ;
MODEL Y = VAR1 VAR2 VAR3 VAR1*VAR3
/RANONE VCOMP = FB;
ID CUSTOMER_ID TIME_PERIOD;
RUN;

I have the following questions:

1. What are the differences in the modeling assumptions between using proc glimmix and proc panel? Aren't they the same (as GLS/FGLS) for a linear model with a single random effect? If so, shouldn't they give identical estimates? I understand that proc glimmix uses GEE, but specifying normal distribution should be the same as GLS, right?

2. Is proc glimmix "better than" proc panel in some sense? Or do they both do very different things (in the case of panel data) and may be, I am completely missing something here.

I would highly appreciate your comments.

BK

lvm · Posted 07-25-2016 11:03 AM

Actually, the two procedures are giving you the "same" results, with slight variations because of different estimation algorithms. The apparent difference is due to the way the two procedures parameterize the class variable. In GLIMMIX, the parameter for the last level of the class variable (1 here) is forced to be 0. In PANEL, the parameter for the first level of the class variable (0 here) is forced to be 0. This reverses sign for the class parameter, and changes the intercept.

Try running GLIMMIX by removing dep_var1 from the class statement.

CLASS CUSTOMER_ID time_period;
MODEL dep_var2 = ind_VAR1 ind_VAR2 dep_var1
/LINK = IDENTITY DIST = NORMAL SOLUTION;
RANDOM INTERCEPT/ SUBJECT = CUSTOMER_ID TYPE= VC;

You now get the same results as with PANEL. This can only be done when there are two levels to the class variable, coded as 0 and 1.

Note that the random effect variance is 0. That also is contributing to the equivalence of the two methods.

View solution in original post

SteveDenham · Posted 07-18-2016 11:16 AM

I want to start by saying that I do over 90% of my work in PROC GLIMMIX, so I would say I have a prejudice.

In your PROC GLIMMIX code, I can't seem to identify which variable is your time variable. That is critical to modeling what is going on--you will likely need another RANDOM statement in order to capture the within subject correlation over time. When we have that in hand, I think I can come up with code to address your issue.

Steve Denham

BK2 · Posted 07-18-2016 11:53 AM

Hi Steve,

Thank you for your response. I did not have any random or fixed effects for the time variable, hence did not include them in GLIMMIX. My model specification is:

Y_ij = B_0 + B_1 *VAR_ij + ...+ B_4 *VAR1VAR3_ij + v_i + e_ij,

where v_i is the random effect for subject i and e_ij is the common error term. PROC PANEL requires me to specify both cross-section and time period variables regardless, hence I had to include them in that code.

Since I only have random effects for the subject, does it not automatically introduce correlation over time for the same subject? In the above specification, the error terms for a customer across time periods are correlated because of v_i terms.

Also, can you please comment on how GLIMMIX differ from PANEL in terms of the above analysis?

Thanks!

BK

SteveDenham · Posted 07-18-2016 01:40 PM

For GLIMMIX to model the within subject effects, you must specify a time period variable, and if you expect it to have a differential effect on the levels of the other variables, you must also include an interaction effect. Given that, I think the following approximates what you trying to do:

PROC GLIMMIX DATA=BK.DATA1;
CLASS VAR1 TIME CUSTOMER_ID;
MODEL Y = VAR1 VAR2 VAR3 VAR1*VAR3 TIME TIME*VAR1 TIME*VAR2 TIME*VAR3 TIME*VAR1*VAR3
/LINK = IDENTITY DIST = NORMAL SOLUTION;

RANDOM INTERCEPT/ SUBJECT = CUSTOMER_ID TYPE= VC V;
RANDOM TIME/SUBJECT = CUSTOMER_ID TYPE=AR(1);
RUN;

This models an autoregressive error structure over time within each customer. Other error structures may be more appropriate, depending on the data generating process and the spacing in time of the measurements.

Thus, cross-sectional comparisons can be done at each time interval, while the other effects can be roughly translated as "intercepts" when TIME=0.

Steve Denham

lvm · Posted 07-18-2016 04:35 PM

In GLIMMIX or MIXED, you need a random or repeated statement to duplicate PROC PANEL. The subject effect is really there with PANEL (implicitly), whether you think it is there or not. This dated article may help you:

http://www2.sas.com/proceedings/forum2007/170-2007.pdf

BK2 · Posted 07-19-2016 12:01 PM

Ivm, thanks, that is definitely helpful. The following code (taken from that paper) implements two-random effects in PROC MIXED.

proc mixed data=two method=type3;
 class i t;
 model y = x1 x2 x3 /solution;
 random i t;
run;

In my case I am only trying to model subject specific unobserved heterogeneity (i.e., one-way random effects). In which case do you think I still need to state "class t" and "random t"? This related to Steve's suggestion above. My understanding is that having a "t" terms introduces covariance across different subjects in the same period, which I am trying to avoid.

Thanks,

BK

BK2 · Posted 07-19-2016 11:56 AM

Steve, the correlation I am trying to capture is that: the error terms across time periods for the same subject have the identical covariances (which arises from have subject-specific random effect). My understanding is that autoregressive structure captures decaying (decreasing as time between two periods increases) covariance in the idiosyncratic errors and not subject-specific random effects.

Given the above, do you think I would still need the CLASS TIME and RANDOM TIME statements?

Thanks,

BK

lvm · Posted 07-19-2016 01:16 PM

A random i; term would give you compound symmetry for the different times within each subject (i). That means equal correlation. I think that is what you want.

SteveDenham · Posted 07-20-2016 01:37 PM

@lvm, I think I want to disagree. I think that ignoring the repeated nature, implemented by t in this example, is the same as throwing all of the measurements for an individual into one large bucket, such that a "panel" inference would be impossible. It would just be a one-way analysis. Maybe I am missing something.

Steve Denham

lvm · Posted 07-20-2016 01:45 PM

FOr normal distribution, the random i statement is giving compound symmetry within subjects (i), since there is also a residual by default. All observation pairs within i have the same correlation. Just like in a RCBD. Of course, to get a structure to the correlation, other statements would be needed.

SteveDenham · Posted 07-20-2016 01:58 PM

Thanks, @lvm, that makes sense. it just doesn't fit my preconceived notion of what panel data looks like, so I plead guilty to carrying my prejudices into the analysis.

Steve Denham

lvm · Posted 07-20-2016 02:13 PM

I am not recommending this model, per se, just showing a model that could be used.

BK2 · Posted 07-20-2016 04:50 PM

I am getting very different outputs using PROC GLIMMIX and PROC PANEL with the same panel data (although I thought the below two specifications are statistically equivalent). I agree they are using different estimation techniques (PANEL uses FGLS and GLIMMIX uses GEE) but I think that is not the reason for the discrepency.

Data test;
Title 'sample_customer';
Input customer_id time_period dep_var1 dep_var2 ind_var1 ind_var2;
datalines;
1 1 1 12 10 .9
1 2 1 15 7 8.3
1 3 1 8.9 8 2.3
1 4 0 0 6 2
1 5 0 0 6 5
1 6 1 19 3 4
1 7 1 4 4 3

2 1 1 12 10 5
2 2 0 0 7 3
2 3 0 0 8 3
2 4 0 0 6 2

3 1 1 40 20 10
3 2 1 24 17 19
3 3 0 0 18 2.3
3 4 0 0 16 12
3 5 0 0 26 35
3 6 0 0 33 24
3 7 0 0 24 13
3 8 0 0 12 31
3 9 1 42 36 18;

PROC GLIMMIX DATA=test;
CLASS CUSTOMER_ID dep_var1;
MODEL dep_var2 = ind_VAR1 ind_VAR2 dep_var1 
/LINK = IDENTITY DIST = NORMAL SOLUTION;
RANDOM INTERCEPT/ SUBJECT = CUSTOMER_ID TYPE= VC;
RUN;

PROC PANEL DATA=test;
CLASS dep_var1;
MODEL dep_var2 = ind_VAR1 ind_VAR2 dep_var1 
/RANONE VCOMP = FB;
ID CUSTOMER_ID TIME_PERIOD;
RUN;

Thanks,

BK

lvm · Posted 07-20-2016 04:56 PM

Actually, GLIMMIX uses REML (restricted ML) for normal data, such as yours, not GEE. It uses GEE when there is a Poisson or binomial and one properly sets up a residual structure.

BK2 · Posted 07-20-2016 05:21 PM

Ivm, I noticed your reply after I finished editing my post. Oops! Thanks for pointing out that the estimation technique is restricted ML.

SAS Innovate 2025: Call for Content