BookmarkSubscribeRSS Feed
UniversitySas
Quartz | Level 8

So let's say I have a data-set with firms of all different sizes.

Now for example, suppose I want to run 2 separate regressions based on the size of the firms.

 

One regression for firm size lower than the median size, and for firms above median size.

 

In my case, I created a variable that is equal to "1" or "." (missing) for <median and "1" or "." for >median size for every firm.

 

Then I multiplied  these variables by firm returns. My logic here was that I will have a set of returns for firms <median size and set of returns for firms >median size. 

 

Call these two return variables Less_median_returns and greater_med_returns.

Then, if a firm is <median size, it's return value in in "greater_med_returns" will just be ".", missing, right?

 

So my question is, would running these two proc reg's give me what I'm after?

 

PROC REG DATA = have;

   small: MODEL Less_median_returns = variable1;

   large: MODEL  greater_med_returns = variable1;

QUIT;

Would this regression give me the same results as if I just created two separate tables based on my dummy variable? E.g.,

DATA small_firms;
SET HAVE;
if firm_Size le median;
RUN;

DATA large_firms;
     SET HAVE;
      if firm_Size > median;
RUN;

PROC REG DATA = small_firms;
     model returns = var1;
RUN;

PROC REG DATA = large_firms;
     model returns = var1;
RUN;

would the outputs be the same in both approaches?

1 REPLY 1
PaigeMiller
Diamond | Level 26

@UniversitySas wrote:

So let's say I have a data-set with firms of all different sizes.

Now for example, suppose I want to run 2 separate regressions based on the size of the firms.

 

One regression for firm size lower than the median size, and for firms above median size.

 

In my case, I created a variable that is equal to "1" or "." (missing) for <median and "1" or "." for >median size for every firm.

 

Then I multiplied  these variables by firm returns. My logic here was that I will have a set of returns for firms <median size and set of returns for firms >median size. 

 

Call these two return variables Less_median_returns and greater_med_returns.

Then, if a firm is <median size, it's return value in in "greater_med_returns" will just be ".", missing, right?

 

So my question is, would running these two proc reg's give me what I'm after?

 

PROC REG DATA = have;

   small: MODEL Less_median_returns = variable1;

   large: MODEL  greater_med_returns = variable1;

QUIT;

Would this regression give me the same results as if I just created two separate tables based on my dummy variable? E.g.,

DATA small_firms;
SET HAVE;
if firm_Size le median;
RUN;

DATA large_firms;
     SET HAVE;
      if firm_Size > median;
RUN;

PROC REG DATA = small_firms;
     model returns = var1;
RUN;

PROC REG DATA = large_firms;
     model returns = var1;
RUN;

would the outputs be the same in both approaches?


You can just run the code and find out if the outputs are the same.

--
Paige Miller

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 1 reply
  • 507 views
  • 0 likes
  • 2 in conversation