Recently I use panel procedure to do a panel data analysis. Here below is my original code:
/*
R square = 0.3966
F test for no fixed effects p < 0.0001
Some cross-section fixed effects are significant while some are not.
beta_StructureBreak = 3996.575 p <0.0001
beta_event_level1 = 380.9467 p = 0.0185
beta_event_level2 = 563.2005 p = 0.0005
beta_GDP_per_capita = 344.8128 p < 0.0001
beta_HHI = 0.643129 p <0.0001
*/
proc panel data=rrd_come.city_ln_combo_7;
id City TimeID;
model LoanNum = StructureBreak event_level1 event_level2 GDP_per_capita HHI / fixone printfixed;
run;
My dataset covers 38 quarters, and event_level1 and event_level2 are two dummies representing 2020Q1 and 2020Q2, respectively. Since the dummy variables are far less than periods, it will not cause perfect multicollinearity. By now, everything was fine.
Then I want to add some terms interacted with these two dummy variables. The internet said I have to first declare these two dummy variables as categorical variables in the panel procedure. So here below is my new code:
/* R square = 0.4910 F test for no fixed effects p < 0.0001 Some cross-section fixed effects are significant while some are not. beta_StructureBreak = 4056.795 p < 0.0001 beta_event_level1 = -5116.73 p < 0.0001 beta_event_level2 = -4629.75 p < 0.0001 beta_GDP_per_capita = 228.6001 p < 0.0001 beta_HHI = 0.392422 p <0.0001 beta_event_level1*GDP_per_capita = 1071.573 p < 0.0001 beta_event_level2*GDP_per_capita = 1039.692 p < 0.0001 beta_event_level1*HHI = -0.94809 p < 0.0001 beta_event_level2*HHI = -1.04389 p < 0.0001 */ proc panel data=rrd_come.city_ln_combo_7; id City TimeID; class event_level1 event_level2; model LoanNum = StructureBreak event_level1 event_level2 GDP_per_capita HHI event_level1*GDP_per_capita event_level2*GDP_per_capita event_level1*HHI event_level2*HHI / fixone printfixed; run;
In the log, the SAS system warns possible multicollinearity. I first thought interaction terms cause it, so I delete relative code and retry the following:
proc panel data=rrd_come.city_ln_combo_7; id City TimeID; class event_level1 event_level2; model LoanNum = StructureBreak event_level1 event_level2 GDP_per_capita HHI / fixone printfixed; run;
SAS system still warns the possible multicollinearity, which means just declaring these two dummy variables as categorical variables cause possible multicollinearity. So anyone can explain the reason for that? Thanks a lot!!
Adding: I don't think this statement is true (but I admit to having only limited experience with PROC PANEL)
The internet said I have to first declare these two dummy variables as categorical variables in the panel procedure. So here below is my new code:
Please quote the exact message, word-for-word, that you see in SAS.
NOTE: The transformed regression does not have full rank. Be aware of possible multicollinearity and/or identification problems before using the FixOne method results.
You can examine the transformation performed by PROC PANEL using the OUTTRANS= option. https://documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.4&docsetId=etsug&docsetTarget=etsug_...
From there you can try to determine what is causing the multicollinearity.
Or, it may be caused my missing data as @Reeza says.
Adding: I don't think this statement is true (but I admit to having only limited experience with PROC PANEL)
The internet said I have to first declare these two dummy variables as categorical variables in the panel procedure. So here below is my new code:
Problem is solved in a surprising way. Yes, declaring dummy variables as categorical variables is not the only way to include interaction terms. Instead, this time I add interaction terms by manually multiplying dummy variables with control variables. SAS system does not warn possible multicollinearity any longer. Since I have not made any essential changes to my model, I think possible multicollinearity before is not severe or even just a bug of SAS?
Not necessarily a bug, but a standard feature of the CLASS statement. It does exactly what I would expect it to do.
SteveDenham
SAS system still warns the possible multicollinearity, which means just declaring these two dummy variables as categorical variables cause possible multicollinearity. So anyone can explain the reason for that? Thanks a lot!!
Check if the # of observations in each model is the same. I'm guessing one of your categorical variables has missing values, and SAS will exclude any row if any variables declared in the PROC are missing.
Following up on @Reeza 's comment - If I understand correctly you have two dummy variables - event1 and event2.. Event1 is 1 if it is 2020Q1, and 0 for all other quarters, while event2 is 1 if it is 2020Q2 and 0 for all other quarters. Is that how the dummy variables are coded? This is probably more critical to working out the effect of a level shift due to something changing from Q1 to Q2 in 2020. I would code these somewhat differently - event1 is 1 for all quarters up to and including 2020Q1 and 0 for all quarters after that, and event2 is 0 for all quarters up to and including 2020Q1 and 1 for the single quarter after that. If it is coded any way besides these two, then there will be missing values for the other quarters and the design matrix will not be full rank. PANEL interprets this as multicollinearity, but it actually is misspecification of the model.
Now if the coding is as the first, and you treat event1 and event2 as continuous covariates (your first code), all will work out as the design matrix is still full rank. However, once you move these two variables to class variables, the two columns in the design matrix will be a linear combination of one another, and you are now not full rank. If you code in the second manner, I suspect the same will happen. Here is where @PaigeMiller 's suggestion of the OUTTRANS= option will come in handy to check what is going on. Additionally, use the CORR or COV option in the MODEL statement to see if there a columns/rows that are all zero--these are the culprits in the non-full rank/multicollinearity error.
So what to do with non-full rank designs? Use methods that do not require full rank matrices and actually use reduced rank matrices. I offer again PROC GLIMMIX. Or reconsider how to code the event. It is possible to use a single variable to code what you have, rather than two. That would solve the non-full rank issue. I suggest coding all of the quarters up to and including 2020Q1 as 0, and 2020Q2 as 1. Call this variable level_event, and use it as a CLASS variable.
SteveDenham
Additionally, use the CORR or COV option in the MODEL statement to see if there a columns/rows that are all zero--these are the culprits in the non-full rank/multicollinearity error.
Multicollinearity can also be caused by linear combinations of some variables that always equal a constant, this may not show up in CORR or COV. If this is the case, then something like PROC PRINCOMP on the OUTTRANS data set would show one or more zero eigenvalues, and the associated eigenvectors indicate the linear combination that has zero variability.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.