BookmarkSubscribeRSS Feed
UniversitySas
Quartz | Level 8

So let's say I have a data-set with firms of all different sizes.

Now for example, suppose I want to run 2 separate regressions based on the size of the firms.

 

One regression for firm size lower than the median size, and for firms above median size.

 

In my case, I created a variable that is equal to "1" or "." (missing) for <median and "1" or "." for >median size for every firm.

 

Then I multiplied  these variables by firm returns. My logic here was that I will have a set of returns for firms <median size and set of returns for firms >median size. 

 

Call these two return variables Less_median_returns and greater_med_returns.

Then, if a firm is <median size, it's return value in in "greater_med_returns" will just be ".", missing, right?

 

So my question is, would running these two proc reg's give me what I'm after?

 

PROC REG DATA = have;

   small: MODEL Less_median_returns = variable1;

   large: MODEL  greater_med_returns = variable1;

QUIT;

Would this regression give me the same results as if I just created two separate tables based on my dummy variable? E.g.,

DATA small_firms;
SET HAVE;
if firm_Size le median;
RUN;

DATA large_firms;
     SET HAVE;
      if firm_Size > median;
RUN;

PROC REG DATA = small_firms;
     model returns = var1;
RUN;

PROC REG DATA = large_firms;
     model returns = var1;
RUN;

would the outputs be the same in both approaches?

1 REPLY 1
PaigeMiller
Diamond | Level 26

@UniversitySas wrote:

So let's say I have a data-set with firms of all different sizes.

Now for example, suppose I want to run 2 separate regressions based on the size of the firms.

 

One regression for firm size lower than the median size, and for firms above median size.

 

In my case, I created a variable that is equal to "1" or "." (missing) for <median and "1" or "." for >median size for every firm.

 

Then I multiplied  these variables by firm returns. My logic here was that I will have a set of returns for firms <median size and set of returns for firms >median size. 

 

Call these two return variables Less_median_returns and greater_med_returns.

Then, if a firm is <median size, it's return value in in "greater_med_returns" will just be ".", missing, right?

 

So my question is, would running these two proc reg's give me what I'm after?

 

PROC REG DATA = have;

   small: MODEL Less_median_returns = variable1;

   large: MODEL  greater_med_returns = variable1;

QUIT;

Would this regression give me the same results as if I just created two separate tables based on my dummy variable? E.g.,

DATA small_firms;
SET HAVE;
if firm_Size le median;
RUN;

DATA large_firms;
     SET HAVE;
      if firm_Size > median;
RUN;

PROC REG DATA = small_firms;
     model returns = var1;
RUN;

PROC REG DATA = large_firms;
     model returns = var1;
RUN;

would the outputs be the same in both approaches?


You can just run the code and find out if the outputs are the same.

--
Paige Miller

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 1 reply
  • 527 views
  • 0 likes
  • 2 in conversation