@UniversitySas wrote:
So let's say I have a data-set with firms of all different sizes.
Now for example, suppose I want to run 2 separate regressions based on the size of the firms.
One regression for firm size lower than the median size, and for firms above median size.
In my case, I created a variable that is equal to "1" or "." (missing) for <median and "1" or "." for >median size for every firm.
Then I multiplied these variables by firm returns. My logic here was that I will have a set of returns for firms <median size and set of returns for firms >median size.
Call these two return variables Less_median_returns and greater_med_returns.
Then, if a firm is <median size, it's return value in in "greater_med_returns" will just be ".", missing, right?
So my question is, would running these two proc reg's give me what I'm after?
PROC REG DATA = have;
small: MODEL Less_median_returns = variable1;
large: MODEL greater_med_returns = variable1;
QUIT;
Would this regression give me the same results as if I just created two separate tables based on my dummy variable? E.g.,
DATA small_firms;
SET HAVE;
if firm_Size le median; RUN; DATA large_firms; SET HAVE; if firm_Size > median; RUN; PROC REG DATA = small_firms; model returns = var1; RUN; PROC REG DATA = large_firms; model returns = var1; RUN;
would the outputs be the same in both approaches?
You can just run the code and find out if the outputs are the same.
... View more