BookmarkSubscribeRSS Feed
ilikesas
Barite | Level 11

Hi,

 

I am doing many regressions on the the data set sashelp.cars.

 

I have created a table with all the models for which I want to do the regressions, and this table has a combination and count variables that look like this:

 

Combination count
Horsepower MPG_City Weight Length 4
EngineSize Length 2
Cylinders Horsepower Length 3

 

then I created macro variables &comb and &num_iv for each of the combinations

 

Then I did the following macro which should do all the regressions for all combinations that have >= 3 variables:

 

%macro all_regressions;

ods exclude all; 
ods output parameterestimates = parameter_estimates;
ods output anova = anova;
ods output FitStatistics = fit_statistics;
proc reg data=sashelp.cars  ;
%do k = 1 %to &cnt;
%if &&num_iv&k >=3 %then %do;
m&k: model invoice msrp = &&comb&k; 
%end;
%end;
run;

%mend;

%all_regressions;

quit;

At first everything seemd to work well but then I notices something:

 

In my ANOVA table, the Total DF for all the models = 425. This means that 426 observations were used. But the data sashelp.cars has 428 observarions. I did a proc means and it showed me that the var "Cylinders" is the only one that has 426 observations instead of 428, and upon observation I saw that these are obs= 248 and obs=249 which it is missing.

 

But the problem is that because the models which include the var Cylinders have 426 observations, SAS assigns 426 observatins to ALL the other models, even though they have 428 observations. In this case 2 observations won't make a big difference, bt it does make a difference when it is hundreds and thousands of observations.

 

Is there a way to solve this problem?

 

 

Thank you! 

5 REPLIES 5
Reeza
Super User

Unfortunately I think that means running a distinct PROC REG for each model. 

You'll need to change your loop structure and append the tables in each loop, but the general approach is the same. 

 

I thought there was a LISTWISE option but can't seem to find it. 

ballardw
Super User

From the documentation:

PROC REG constructs only one crossproducts matrix for the variables in all regressions. If any variable needed for any regression is missing, the observation is excluded from all estimates. If you include variables with missing values in the VAR statement, the corresponding observations are excluded from all analyses, even if you never include the variables in a model. PROC REG assumes that you might want to include these variables after the first RUN statement and deletes observations with missing values.

 

So your code is behaving as documented.

 

Your solution might be to create two groups of models, one that groups all the models that include the offending variable and another that doesn't. Then create your macro variables cnt num_iv and comb separately and call the macro twice.

ilikesas
Barite | Level 11

This is the macro that I did:

 

ods exclude all;

%macro all_regressions;
%do k = 1 %to &cnt;


ods output parameterestimates = p&k;
ods output anova = a&k;
ods output FitStatistics = f&k;
proc reg data=cars  ;
m&k: model invoice msrp = &&comb&k; 
run;

proc append base = parameter_estimates data=p&k;
run;

%if &k = 1 %then %do;
data anova;
set a1;
run;
data fitstatistics;
set f1;
run;
%end;

%if &k > 1 %then %do;
proc append base=anova data=a&k;
run;
proc append base = fitstatistics data=f&k;
run;
%end;

PROC SQL;
DROP TABLE p&k , a&k , f&k;
QUIT;

%end;

%mend;

%all_regressions;

quit;

 Please note that for the table parameter_estimates, the variables "variable" and "label" change with each regressions becasue they depend on the longest name of the of the IVs, so for this case just did one regression, exported the parameter_estimates into Excel, manually enterred an alphanumerical value of length 32, then imported back to SAS and to it appended all the parameter estimetes tables that the macro outputs.

For anova and fitstatistics, the columns are of constant length so just appended all the tables to the first that the macro outputs, since there are no issues here concerning appending variables of different lengths.

 

I will appreciate any comments and advice! 

 

Reeza
Super User

PROC APPEND doesn't require the base data to exist so you don't need the conditional logic for that section.

ballardw
Super User

Personally I am not fond of macros with non-visible parameters such as your &&comb&k. I would likely have the comb variable lists in a data set and use call execute to run the regressions.

 

Style choices abound. I would modify the macro to more like:

%macro all_regressions;
%do k = 1 %to &cnt;

   ods output parameterestimates = p&k;
   ods output anova = a&k;
   ods output FitStatistics = f&k;
   proc reg data=cars  ;
   m&k: model invoice msrp = &&comb&k; 
   run;
%end;

data parameter_estimates;
  set p1 - p&cnt;
run;
data anova;
   set a1 - a&cnt;
run;
data fitstatistics;
   set f1- f&cnt;
run;

proc datasets nodetails;
   delete   p1 - p&cnt a1 - a&cnt f1- f&cnt;
run;
quit;

%end;

Generate all the output sets in the shorter K loop. Then use the data step set with list of names, which would pretty much cover the cases if you did have differences that Proc Append has issues with, then Proc Datasets to delete the variables.

 

I would only have that delete after getting the rest working as you may find other issues where you have to examine the individual data sets a, f and p to see what goes on.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1427 views
  • 5 likes
  • 3 in conversation