BookmarkSubscribeRSS Feed
UniversitySas
Quartz | Level 8

So let's say I have a data-set with firms of all different sizes.

Now for example, suppose I want to run 2 separate regressions based on the size of the firms.

 

One regression for firm size lower than the median size, and for firms above median size.

 

In my case, I created a variable that is equal to "1" or "." (missing) for <median and "1" or "." for >median size for every firm.

 

Then I multiplied  these variables by firm returns. My logic here was that I will have a set of returns for firms <median size and set of returns for firms >median size. 

 

Call these two return variables Less_median_returns and greater_med_returns.

Then, if a firm is <median size, it's return value in in "greater_med_returns" will just be ".", missing, right?

 

So my question is, would running these two proc reg's give me what I'm after?

 

PROC REG DATA = have;

   small: MODEL Less_median_returns = variable1;

   large: MODEL  greater_med_returns = variable1;

QUIT;

Would this regression give me the same results as if I just created two separate tables based on my dummy variable? E.g.,

DATA small_firms;
SET HAVE;
if firm_Size le median;
RUN;

DATA large_firms;
     SET HAVE;
      if firm_Size > median;
RUN;

PROC REG DATA = small_firms;
     model returns = var1;
RUN;

PROC REG DATA = large_firms;
     model returns = var1;
RUN;

would the outputs be the same in both approaches?

1 REPLY 1
PaigeMiller
Diamond | Level 26

@UniversitySas wrote:

So let's say I have a data-set with firms of all different sizes.

Now for example, suppose I want to run 2 separate regressions based on the size of the firms.

 

One regression for firm size lower than the median size, and for firms above median size.

 

In my case, I created a variable that is equal to "1" or "." (missing) for <median and "1" or "." for >median size for every firm.

 

Then I multiplied  these variables by firm returns. My logic here was that I will have a set of returns for firms <median size and set of returns for firms >median size. 

 

Call these two return variables Less_median_returns and greater_med_returns.

Then, if a firm is <median size, it's return value in in "greater_med_returns" will just be ".", missing, right?

 

So my question is, would running these two proc reg's give me what I'm after?

 

PROC REG DATA = have;

   small: MODEL Less_median_returns = variable1;

   large: MODEL  greater_med_returns = variable1;

QUIT;

Would this regression give me the same results as if I just created two separate tables based on my dummy variable? E.g.,

DATA small_firms;
SET HAVE;
if firm_Size le median;
RUN;

DATA large_firms;
     SET HAVE;
      if firm_Size > median;
RUN;

PROC REG DATA = small_firms;
     model returns = var1;
RUN;

PROC REG DATA = large_firms;
     model returns = var1;
RUN;

would the outputs be the same in both approaches?


You can just run the code and find out if the outputs are the same.

--
Paige Miller

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 1 reply
  • 292 views
  • 0 likes
  • 2 in conversation