Solved: Re: Proc Reg several regressions with missing values

va003 · Posted 07-05-2019 07:40 PM

Hello,

For running several different regressions in SAS, I know I can do this:

proc reg data=have outest=want noprint edf tableout;		
	model a = x y z;	
	model b = x y z;	
run;

However, now I come across a situation when my variables (e.g. "a" & "b") are filled with missing values. For each missing a value, SAS would omit the observation for all models, even if b value is not missing in that observation. This causes SAS to provide different coefficients than it would if I were to run each model separately (like this: )

proc reg data=have outest=want noprint edf tableout;		
	model a = x y z;		
run;
proc reg data=have outest=want noprint edf tableout;		
	model b = x y z;	
run;

Is there a way for me to not have to do that? i.e. Is there some syntax I can add to this proc reg so SAS would treat each of my models separately?

Thank you very much.

PGStats · Posted 07-05-2019 08:06 PM

The doc says:

Missing Values

PROC REG constructs only one crossproducts matrix for the variables in all regressions. If any variable needed for any regressionis missing, the observation is excluded from all estimates. If you include variables with missing values in the VAR statement, the corresponding observations are excluded from all analyses, even if you never include the variables in a model.PROC REG assumes that you might want to include these variables after the first RUN statement and deletes observations withmissing values.

So...

data have2;
set have;
v = a;
var = "a";
output;
v = b;
var = "b";
output;
run

proc sort data=have2; by var; run;

proc reg data=have2 outest=want noprint edf tableout; 
by var;
model v = x y z;
run;

(untested)

PG

View solution in original post

PGStats · Posted 07-05-2019 08:06 PM

The doc says:

Missing Values

PROC REG constructs only one crossproducts matrix for the variables in all regressions. If any variable needed for any regressionis missing, the observation is excluded from all estimates. If you include variables with missing values in the VAR statement, the corresponding observations are excluded from all analyses, even if you never include the variables in a model.PROC REG assumes that you might want to include these variables after the first RUN statement and deletes observations withmissing values.

So...

data have2;
set have;
v = a;
var = "a";
output;
v = b;
var = "b";
output;
run

proc sort data=have2; by var; run;

proc reg data=have2 outest=want noprint edf tableout; 
by var;
model v = x y z;
run;

(untested)

PG

va003 · Posted 07-05-2019 08:25 PM

Hello PG,

Thank you for your response. I don't quite understand what you do there. In my example, there is only 1 a (aka "a") and 1 b (aka "b"), and I'm afraid I have confused you. Are you combining a and b values into 1 variable, calling it v, then run prog reg against it? That's not quite what I'm trying to do. The variables a and b are different. In some observations, there are missing a values while in some other observations, there are b missing values (there are certainly overlaps but that should be a factor to consider).

PaigeMiller · Posted 07-05-2019 09:10 PM

The code by @PGStats is what you want, it produces a regression for A and a regression for B, and if A is missing, the observation is still used for regression B, and vice versa. He is not combining A and B into a single variable mathematically, he is performing a "trick" to allow you to achieve separate regressions with separate handling of missings, which is exactly what you asked for when you said "Is there some syntax I can add to this proc reg so SAS would treat each of my models separately?"

But, you have also created code, with the two different PROC REGs, which should do the same thing.

You state:

This causes SAS to provide different coefficients than it would if I were to run each model separately.

Different coefficients is what you get. The two different codes you provide for the regressions cannot (it is impossible in the presence of missing values) result in the same coefficients from both.

--
Paige Miller

va003 · Posted 07-06-2019 10:19 AM

Thank you very much for your explanation! It makes sense now! Sorry I'm still learning.

arthurcavila · Posted 07-05-2019 09:17 PM

He is assigning the value of both a and b into v and giving a label "a" or "b" into a variable var, then asking to run the regression on v subseting the data by var value. It multiplies the number of lines you have on the data by the number of regressions you want.

Is there a particular reason why you need everything in a single PROC REG? You can write a macro to avoid repeating the text.

%macro myreg(var);
	proc reg data=have outest=want noprint edf tableout;		
		model &var = x y z;	
	run;
%mend;

%myreg(a)
%myreg(b)
...

PaigeMiller · Posted 07-05-2019 09:18 PM

@arthurcavila wrote:

Is there a particular reason why you need everything in a single PROC REG?

A very good question.

--
Paige Miller

va003 · Posted 07-06-2019 10:21 AM

No, I just try to avoid running the identical code for each repression that's why. Thank you.

PGStats · Posted 07-05-2019 10:31 PM

You will best understand my suggestion by trying out the code and checking the printed output and the dataset output. Of course, you can also call prog reg for each variable, but you will get two output datasets that you will then have to combine.

PG

va003 · Posted 07-06-2019 10:51 AM

Thank you. This is great help. You're the hero I need not the hero I deserve.

Registration is open

SAS Training: Just a Click Away