BookmarkSubscribeRSS Feed
omer2020
Obsidian | Level 7

I am trying to run fixed effect regression. In first step I need to run regression on every firm-year (each year of each firm individually) and then using the intercept of that (first) regression in second regression.
* where count is unique identifier for each firm-year;
* MACC is unique identifier for each time observations; * PERMNO is unique identifier for each firm;
* EXRET =excess return, MKTRF=Market return;

proc reg data=dtab2 outest=xreg noprint;
model exret = Mktrf / adjrsq; quit;

proc reg data=dtab2 outest=xreg noprint;
by permno macc; model exret = Mktrf / adjrsq; quit;

proc reg data=dtab2 outest=xreg noprint;
by count; model exret = Mktrf / adjrsq; quit;

Without 'BY VARIABLE' obviously whole model is regressed and output contains single observation of regression outputs.
With 'BY PERMNO MACC' and 'BY COUNT' variable, output file has no values for beta & RMSE, whereas generated intercept are exactly equal to exret (dep variable) itself.

 

I also tried PROC GLM and PROC PANEL regression;

proc glm data=dtab2;
by count;
model exret = Mktrf | count / solution noint;
output out=xglm p=predicted; quit;

proc panel data=dtab2lag outest=xpanel_pm;
id permno macc;  *cross_section-id time_series-id;
model exret = Mktrf ;run;


Results give error" The Hausman statistic cannot be computed because the difference in covariance matrices is not of sufficient rank"
After searching for the this error, it is mostly caused by multicolinearity, but since I only have one independent variable in my model, so multicolinearity won't make sense.
Kindly suggest me any solution to this problem.

3 REPLIES 3
Rick_SAS
SAS Super FREQ

Other possible issues are missing values, constant values, or an insufficient number of observations.

 

When you get the message, the SAS log should identify the BY group for which the error occurred. There will be a note that says something like "The error is for the following BY group: PERMNO=12345 MACC=XYZ"

You can use that information to for a WHERE clause that displays the data only for that BY group:

 

proc print data=dtab2lag;

WHERE PERMNO=12345 and MACC=XYZ;

run;

 

If the WHERE clause results in observations that are too large to visually inspect, you can use PROC MEANS, PROC GLM, etc on that subset to try to analyze what is going on. Ultimately, you might have to omit BY groups that do not have enough data to analyze:

WHERE NOT (PERMNO=12345 and MACC=XYZ);

 

omer2020
Obsidian | Level 7

Thanks alot @Rick_SAS for replying.

 

I have removed all mising values before trying regression and Mktrf is market index return which is not constant. This is the code step 

I ran before trying to run regression.

data dtab2;
set dtab2lag;
size = log(size);
bm = log(bm);
count = _n_;
if Mktrf ne .;	if size ne .;	if bm ne .;
run;

 

Secondly, while tyring proc reg, there is no error of warning show up. It is just the output results which seem ambigious and make no sense. As modeling one or more than one independent variable have no effect on intercept and all other values remain missing or zero.

 

This is the dtab2 data looks like

스크린샷(26).png

proc reg data=dtab2 outest=xreg_pm noprint;
by permno macc;
model exret = Mktrf / adjrsq;
quit;

proc reg data=dtab2 outest=xreg_pm2 noprint;
by permno macc;
model exret = Mktrf size bm/ adjrsq;
quit;

Both produce same results, which made me believe that this is not working properly. 

스크린샷(28).png

 


Also while running 'PROC PANEL' again no specific error is generated for particular obseravation. Complete log that is generated after running PROC PANEL is this:

397  proc panel data=dtab2lag outest=xpanel_pm;
398  id permno macc;     *cross_section-id time_series-id;
399  model exret = Mktrf ;
400  run;

NOTE: None of the options FIXONE, FIXTWO, RANONE, RANTWO, FULLER, PARKS, DASILVA, GMM1, GMM2, ITGMM, FDONE,
      FDTWO, BTWNT, BTWNG, POOLED, HTAYLOR or AMACURDY were specified. The RANTWO option is assumed.
NOTE: A negative cross-section variance estimate of -0.0145 was obtained during RanTwo method computations
      for MODEL statement Model 1. The estimate was set to zero.
WARNING: The Hausman statistic cannot be computed because the difference in covariance matrices is not of sufficient rank.
ERROR: Java virtual machine exception. java.lang.OutOfMemoryError: GC overhead limit exceeded.
ERROR: Java virtual machine exception. java.lang.OutOfMemoryError: GC overhead limit exceeded.
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.XPANEL_PM may be incomplete.  When this step was stopped there were 1 observations and 13 variables.
NOTE: PROCEDURE PANEL used (Total process time):
      real time           49.94 seconds
      cpu time            18.90 seconds

I would be really greatful, if you can suggest me any way to accomplish running simple regression on each firm-year (each year of every firm, i.e. individual reg for every single row in my data)

Rick_SAS
SAS Super FREQ

Each of your BY groups has only one observation because the values of MACC changes for each observation.

What you are doing is equivalent to the following simple example:

 

proc reg data=sashelp.class outest=xreg plots=none;
by name;
model weight = height / adjrsq;
quit;

With one observation, PROC REG can only fit an intercept. Other regression coefficients are set to 0. Statistics such as RMSE are missing.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 2521 views
  • 0 likes
  • 2 in conversation