BookmarkSubscribeRSS Feed
UniversitySas
Quartz | Level 8

So let's say I have a data-set with firms of all different sizes.

Now for example, suppose I want to run 2 separate regressions based on the size of the firms.

 

One regression for firm size lower than the median size, and for firms above median size.

 

In my case, I created a variable that is equal to "1" or "." (missing) for <median and "1" or "." for >median size for every firm.

 

Then I multiplied  these variables by firm returns. My logic here was that I will have a set of returns for firms <median size and set of returns for firms >median size. 

 

Call these two return variables Less_median_returns and greater_med_returns.

Then, if a firm is <median size, it's return value in in "greater_med_returns" will just be ".", missing, right?

 

So my question is, would running these two proc reg's give me what I'm after?

 

PROC REG DATA = have;

   small: MODEL Less_median_returns = variable1;

   large: MODEL  greater_med_returns = variable1;

QUIT;

Would this regression give me the same results as if I just created two separate tables based on my dummy variable? E.g.,

DATA small_firms;
SET HAVE;
if firm_Size le median;
RUN;

DATA large_firms;
     SET HAVE;
      if firm_Size > median;
RUN;

PROC REG DATA = small_firms;
     model returns = var1;
RUN;

PROC REG DATA = large_firms;
     model returns = var1;
RUN;

would the outputs be the same in both approaches?

1 REPLY 1
PaigeMiller
Diamond | Level 26

@UniversitySas wrote:

So let's say I have a data-set with firms of all different sizes.

Now for example, suppose I want to run 2 separate regressions based on the size of the firms.

 

One regression for firm size lower than the median size, and for firms above median size.

 

In my case, I created a variable that is equal to "1" or "." (missing) for <median and "1" or "." for >median size for every firm.

 

Then I multiplied  these variables by firm returns. My logic here was that I will have a set of returns for firms <median size and set of returns for firms >median size. 

 

Call these two return variables Less_median_returns and greater_med_returns.

Then, if a firm is <median size, it's return value in in "greater_med_returns" will just be ".", missing, right?

 

So my question is, would running these two proc reg's give me what I'm after?

 

PROC REG DATA = have;

   small: MODEL Less_median_returns = variable1;

   large: MODEL  greater_med_returns = variable1;

QUIT;

Would this regression give me the same results as if I just created two separate tables based on my dummy variable? E.g.,

DATA small_firms;
SET HAVE;
if firm_Size le median;
RUN;

DATA large_firms;
     SET HAVE;
      if firm_Size > median;
RUN;

PROC REG DATA = small_firms;
     model returns = var1;
RUN;

PROC REG DATA = large_firms;
     model returns = var1;
RUN;

would the outputs be the same in both approaches?


You can just run the code and find out if the outputs are the same.

--
Paige Miller

sas-innovate-2026-white.png



April 27 – 30 | Gaylord Texan | Grapevine, Texas

Registration is open

Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!

Register now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 1 reply
  • 581 views
  • 0 likes
  • 2 in conversation