So let's say I have a data-set with firms of all different sizes.
Now for example, suppose I want to run 2 separate regressions based on the size of the firms.
One regression for firm size lower than the median size, and for firms above median size.
In my case, I created a variable that is equal to "1" or "." (missing) for <median and "1" or "." for >median size for every firm.
Then I multiplied these variables by firm returns. My logic here was that I will have a set of returns for firms <median size and set of returns for firms >median size.
Call these two return variables Less_median_returns and greater_med_returns.
Then, if a firm is <median size, it's return value in in "greater_med_returns" will just be ".", missing, right?
So my question is, would running these two proc reg's give me what I'm after?
PROC REG DATA = have;
small: MODEL Less_median_returns = variable1;
large: MODEL greater_med_returns = variable1;
QUIT;
Would this regression give me the same results as if I just created two separate tables based on my dummy variable? E.g.,
DATA small_firms;
SET HAVE;
if firm_Size le median;
RUN;
DATA large_firms;
SET HAVE;
if firm_Size > median;
RUN;
PROC REG DATA = small_firms;
model returns = var1;
RUN;
PROC REG DATA = large_firms;
model returns = var1;
RUN;
would the outputs be the same in both approaches?
@UniversitySas wrote:
So let's say I have a data-set with firms of all different sizes.
Now for example, suppose I want to run 2 separate regressions based on the size of the firms.
One regression for firm size lower than the median size, and for firms above median size.
In my case, I created a variable that is equal to "1" or "." (missing) for <median and "1" or "." for >median size for every firm.
Then I multiplied these variables by firm returns. My logic here was that I will have a set of returns for firms <median size and set of returns for firms >median size.
Call these two return variables Less_median_returns and greater_med_returns.
Then, if a firm is <median size, it's return value in in "greater_med_returns" will just be ".", missing, right?
So my question is, would running these two proc reg's give me what I'm after?
PROC REG DATA = have; small: MODEL Less_median_returns = variable1; large: MODEL greater_med_returns = variable1; QUIT;
Would this regression give me the same results as if I just created two separate tables based on my dummy variable? E.g.,
DATA small_firms;
SET HAVE;
if firm_Size le median;
RUN;
DATA large_firms;
SET HAVE;
if firm_Size > median;
RUN;
PROC REG DATA = small_firms;
model returns = var1;
RUN;
PROC REG DATA = large_firms;
model returns = var1;
RUN;would the outputs be the same in both approaches?
You can just run the code and find out if the outputs are the same.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.