BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
pshankar
Calcite | Level 5

I am trying to build linear regression model for all the combination of variable. Ex: If y=c+x1+x2+x3, then I want to build model using the independent variable:

X1

X2

X3

X1 X2

X1 X3

.

.

X1 X2 X3.

I have written the code to generate all the possible combination of variable and now trying to pass each combination in the proc reg, But   I am not able to do so. My aim is to automate the whole code in such a way that if i change the model input data the code automatically generate all the possible model.

Code I have done :

DATA model_data;
input y1 x1 x2 x3;
cards;
2 2 4 5
2 3 4 6
4 5 6 7
3 4 7 8
6 8 9 3
run;

options mprint;
%macro reg(comb_no=,ar_pos=);
data vars&comb_no (keep=v1-v&comb_no);
array v(2:6) $4 ('X_t' 'X_t1' 'X_t2' 'Y_t1','Y_t2');
ncomb=comb(5,&comb_no);
do j=1 to ncomb;
call lexcomb(j, &comb_no, of v(*));
put (v1-v&ar_pos) ($5.);
output;
end;
call symputx('regs',ncomb);
run;
%mend reg;

*also have to form a loop so that it calls the macro 5 times in a single do loop.

data _null_;

%reg(comb_no=1,ar_pos=1);
%reg(comb_no=2,ar_pos=2);
%reg(comb_no=3,ar_pos=3);
%reg(comb_no=4,ar_pos=4);
%reg(comb_no=5,ar_pos=5);
run;

proc print data=work.conc;
run;

data all (keep=full);
set conc;
full=catx(' ',v1,v2,v3,v4,v5);
run;

proc print data=work.all;
run;

data v1;
set all (firstobs=1 obs=1);
run;

data m_data;
set work.ts_data (keep=Y_t X_t);
run;

proc reg data=m_data plots=none;
model Y_t = X_t;
run;

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

Macro is not needed, all possible subsets regression is built into PROC REG, as stated in the documentation (highlighting is mine).

 

R2 Selection (RSQUARE)

 

The RSQUARE method finds subsets of independent variables that best predict a dependent variable by linear regression in the given sample. You can specify the largest and smallest number of independent variables to appear in a subset and the number of subsets of each size to be selected. The RSQUARE method can efficiently perform all possible subset regressions and display the models in decreasing order of R square magnitude within each subset size. Other statistics are available for comparing subsets of different sizes. These statistics, as well as estimated regression coefficients, can be displayed or output to a SAS data set.

Having said that, I agree with @PeterClemmensen , this is generally not a good thing to do, and few people use this approach these days to finding regression models. Just because your computer can do the calculations does not mean you should do these calculations.

 

Other possible methods for selecting a regression model are PROC PLS and PROC GLMSELECT.

--
Paige Miller

View solution in original post

4 REPLIES 4
PeterClemmensen
Tourmaline | Level 20

Hi @pshankar 🙂

 

What are you trying to accomplish here? Do you want to run all linear models for som variables and choose the best? Or something else? In either case, a macro is probably not the easiest way to go.

pshankar
Calcite | Level 5

I want to run linear model for the all possible combination of variable and  choose one of the best model. Like, If in the data  3 independent variable is there then total of 7 model will be there. So I just want to design the code in such a way that if i will change the the input data with 5 variable then sas code automatically generate the 31 model. And after generating the model i will pick one with the best result. In R language it is easier to do but i am finding bit difficult in sas to do this. Is it possible to design the sas macro in such a fashion?

PaigeMiller
Diamond | Level 26

Macro is not needed, all possible subsets regression is built into PROC REG, as stated in the documentation (highlighting is mine).

 

R2 Selection (RSQUARE)

 

The RSQUARE method finds subsets of independent variables that best predict a dependent variable by linear regression in the given sample. You can specify the largest and smallest number of independent variables to appear in a subset and the number of subsets of each size to be selected. The RSQUARE method can efficiently perform all possible subset regressions and display the models in decreasing order of R square magnitude within each subset size. Other statistics are available for comparing subsets of different sizes. These statistics, as well as estimated regression coefficients, can be displayed or output to a SAS data set.

Having said that, I agree with @PeterClemmensen , this is generally not a good thing to do, and few people use this approach these days to finding regression models. Just because your computer can do the calculations does not mean you should do these calculations.

 

Other possible methods for selecting a regression model are PROC PLS and PROC GLMSELECT.

--
Paige Miller
Rick_SAS
SAS Super FREQ

Paige mentioned PROC GLMSELECT, but I will make a stronger statement: If you want to perform model selection over all candidate models, use PROC GLMSELECT, not PROC REG.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 2199 views
  • 7 likes
  • 4 in conversation