What is SAS's treatment of a missing value when we run regressions?

UniversitySas · Posted 04-15-2020 12:51 AM

So let's say I have a data-set with firms of all different sizes.

Now for example, suppose I want to run 2 separate regressions based on the size of the firms.

One regression for firm size lower than the median size, and for firms above median size.

In my case, I created a variable that is equal to "1" or "." (missing) for <median and "1" or "." for >median size for every firm.

Then I multiplied these variables by firm returns. My logic here was that I will have a set of returns for firms <median size and set of returns for firms >median size.

Call these two return variables Less_median_returns and greater_med_returns.

Then, if a firm is <median size, it's return value in in "greater_med_returns" will just be ".", missing, right?

So my question is, would running these two proc reg's give me what I'm after?

PROC REG DATA = have;

   small: MODEL Less_median_returns = variable1;

   large: MODEL  greater_med_returns = variable1;

QUIT;

Would this regression give me the same results as if I just created two separate tables based on my dummy variable? E.g.,

DATA small_firms;

    SET HAVE;

     if firm_Size le median;
RUN;

DATA large_firms;
     SET HAVE;
      if firm_Size > median;
RUN;

PROC REG DATA = small_firms;
     model returns = var1;
RUN;

PROC REG DATA = large_firms;
     model returns = var1;
RUN;

would the outputs be the same in both approaches?

PaigeMiller · Posted 04-15-2020 07:13 AM

@UniversitySas wrote:

So let's say I have a data-set with firms of all different sizes.

Now for example, suppose I want to run 2 separate regressions based on the size of the firms.

One regression for firm size lower than the median size, and for firms above median size.

In my case, I created a variable that is equal to "1" or "." (missing) for <median and "1" or "." for >median size for every firm.

Then I multiplied these variables by firm returns. My logic here was that I will have a set of returns for firms <median size and set of returns for firms >median size.

Call these two return variables Less_median_returns and greater_med_returns.

Then, if a firm is <median size, it's return value in in "greater_med_returns" will just be ".", missing, right?

So my question is, would running these two proc reg's give me what I'm after?
PROC REG DATA = have;

   small: MODEL Less_median_returns = variable1;

   large: MODEL  greater_med_returns = variable1;

QUIT;
Would this regression give me the same results as if I just created two separate tables based on my dummy variable? E.g.,
DATA small_firms;

    SET HAVE;

     if firm_Size le median;
RUN;

DATA large_firms;
     SET HAVE;
      if firm_Size > median;
RUN;

PROC REG DATA = small_firms;
     model returns = var1;
RUN;

PROC REG DATA = large_firms;
     model returns = var1;
RUN;
would the outputs be the same in both approaches?

You can just run the code and find out if the outputs are the same.

--
Paige Miller

What is SAS's treatment of a missing value when we run regressions?

Re: What is SAS's treatment of a missing value when we run regressions?

Catch up on SAS Innovate 2026

What is SAS's treatment of a missing value when we run regressions?

Re: What is SAS's treatment of a missing value when we run regressions?

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away