So let's say I have a data-set with firms of all different sizes.
Now for example, suppose I want to run 2 separate regressions based on the size of the firms.
One regression for firm size lower than the median size, and for firms above median size.
In my case, I created a variable that is equal to "1" or "." (missing) for <median and "1" or "." for >median size for every firm.
Then I multiplied these variables by firm returns. My logic here was that I will have a set of returns for firms <median size and set of returns for firms >median size.
Call these two return variables Less_median_returns and greater_med_returns.
Then, if a firm is <median size, it's return value in in "greater_med_returns" will just be ".", missing, right?
So my question is, would running these two proc reg's give me what I'm after?
PROC REG DATA = have;
small: MODEL Less_median_returns = variable1;
large: MODEL greater_med_returns = variable1;
QUIT;
Would this regression give me the same results as if I just created two separate tables based on my dummy variable? E.g.,
DATA small_firms;
SET HAVE;
if firm_Size le median;
RUN;
DATA large_firms;
SET HAVE;
if firm_Size > median;
RUN;
PROC REG DATA = small_firms;
model returns = var1;
RUN;
PROC REG DATA = large_firms;
model returns = var1;
RUN;
would the outputs be the same in both approaches?
@UniversitySas wrote:
So let's say I have a data-set with firms of all different sizes.
Now for example, suppose I want to run 2 separate regressions based on the size of the firms.
One regression for firm size lower than the median size, and for firms above median size.
In my case, I created a variable that is equal to "1" or "." (missing) for <median and "1" or "." for >median size for every firm.
Then I multiplied these variables by firm returns. My logic here was that I will have a set of returns for firms <median size and set of returns for firms >median size.
Call these two return variables Less_median_returns and greater_med_returns.
Then, if a firm is <median size, it's return value in in "greater_med_returns" will just be ".", missing, right?
So my question is, would running these two proc reg's give me what I'm after?
PROC REG DATA = have; small: MODEL Less_median_returns = variable1; large: MODEL greater_med_returns = variable1; QUIT;
Would this regression give me the same results as if I just created two separate tables based on my dummy variable? E.g.,
DATA small_firms;
SET HAVE;
if firm_Size le median;
RUN;
DATA large_firms;
SET HAVE;
if firm_Size > median;
RUN;
PROC REG DATA = small_firms;
model returns = var1;
RUN;
PROC REG DATA = large_firms;
model returns = var1;
RUN;would the outputs be the same in both approaches?
You can just run the code and find out if the outputs are the same.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.