Hello,
For running several different regressions in SAS, I know I can do this:
proc reg data=have outest=want noprint edf tableout;
model a = x y z;
model b = x y z;
run;
However, now I come across a situation when my variables (e.g. "a" & "b") are filled with missing values. For each missing a value, SAS would omit the observation for all models, even if b value is not missing in that observation. This causes SAS to provide different coefficients than it would if I were to run each model separately (like this: )
proc reg data=have outest=want noprint edf tableout;
model a = x y z;
run;
proc reg data=have outest=want noprint edf tableout;
model b = x y z;
run;
Is there a way for me to not have to do that? i.e. Is there some syntax I can add to this proc reg so SAS would treat each of my models separately?
Thank you very much.
The doc says:
Missing Values
PROC REG constructs only one crossproducts matrix for the variables in all regressions. If any variable needed for any regressionis missing, the observation is excluded from all estimates. If you include variables with missing values in the VAR statement, the corresponding observations are excluded from all analyses, even if you never include the variables in a model.PROC REG assumes that you might want to include these variables after the first RUN statement and deletes observations withmissing values.
So...
data have2;
set have;
v = a;
var = "a";
output;
v = b;
var = "b";
output;
run
proc sort data=have2; by var; run;
proc reg data=have2 outest=want noprint edf tableout;
by var;
model v = x y z;
run;
(untested)
The doc says:
Missing Values
PROC REG constructs only one crossproducts matrix for the variables in all regressions. If any variable needed for any regressionis missing, the observation is excluded from all estimates. If you include variables with missing values in the VAR statement, the corresponding observations are excluded from all analyses, even if you never include the variables in a model.PROC REG assumes that you might want to include these variables after the first RUN statement and deletes observations withmissing values.
So...
data have2;
set have;
v = a;
var = "a";
output;
v = b;
var = "b";
output;
run
proc sort data=have2; by var; run;
proc reg data=have2 outest=want noprint edf tableout;
by var;
model v = x y z;
run;
(untested)
Hello PG,
Thank you for your response. I don't quite understand what you do there. In my example, there is only 1 a (aka "a") and 1 b (aka "b"), and I'm afraid I have confused you. Are you combining a and b values into 1 variable, calling it v, then run prog reg against it? That's not quite what I'm trying to do. The variables a and b are different. In some observations, there are missing a values while in some other observations, there are b missing values (there are certainly overlaps but that should be a factor to consider).
The code by @PGStats is what you want, it produces a regression for A and a regression for B, and if A is missing, the observation is still used for regression B, and vice versa. He is not combining A and B into a single variable mathematically, he is performing a "trick" to allow you to achieve separate regressions with separate handling of missings, which is exactly what you asked for when you said "Is there some syntax I can add to this proc reg so SAS would treat each of my models separately?"
But, you have also created code, with the two different PROC REGs, which should do the same thing.
You state:
This causes SAS to provide different coefficients than it would if I were to run each model separately.
Different coefficients is what you get. The two different codes you provide for the regressions cannot (it is impossible in the presence of missing values) result in the same coefficients from both.
Thank you very much for your explanation! It makes sense now! Sorry I'm still learning.
He is assigning the value of both a and b into v and giving a label "a" or "b" into a variable var, then asking to run the regression on v subseting the data by var value. It multiplies the number of lines you have on the data by the number of regressions you want.
Is there a particular reason why you need everything in a single PROC REG? You can write a macro to avoid repeating the text.
%macro myreg(var); proc reg data=have outest=want noprint edf tableout; model &var = x y z; run; %mend; %myreg(a) %myreg(b) ...
@arthurcavila wrote:
Is there a particular reason why you need everything in a single PROC REG?
A very good question.
No, I just try to avoid running the identical code for each repression that's why. Thank you.
You will best understand my suggestion by trying out the code and checking the printed output and the dataset output. Of course, you can also call prog reg for each variable, but you will get two output datasets that you will then have to combine.
Thank you. This is great help. You're the hero I need not the hero I deserve.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.