BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Satyakshma
Fluorite | Level 6

Hi,

 

I have a dataset with 3 independent variables (var1 - var3) and one dependent variable(var0). I need to run regression for all possible combinations of 3 variables.

data have;
input var0 var1 var2 var3;
Datalines;
0.17 3.84 15.60 17.15
0.13 3.72 1.90 17.46
0.18 8.44 22.80 12.37
0.14 6.29 5.60 8.73
;
run;

 

regression to be run for combinations
var1
var2
var3
var1 var2
var2 var3
and so on..

 

 

Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

OK. You want this ?

 

data have;
input var0 var1 var2 var3;
Datalines;
0.17 3.84 15.60 17.15
0.13 3.72 1.90 17.46
0.18 8.44 22.80 12.37
0.14 6.29 5.60 8.73
;
run;


%macro reg_all_comb(dsn=, y= , x= );
%let n=%sysfunc(countw(&x.,%str( ))); %put &=n ;
data all_comb;
length comb $ 200;
 array x{&n.};
 k=-1; 
 do i=1 to 2**&n.;
   rc=graycode(k,of x{*});
   do j=1 to &n.;
    if x{j}=1 then comb=catx(' ',comb,scan("&x.",j,' '));
   end;
   output;call missing(comb);
 end;
run;
data _null_;
 set all_comb(where=(comb is not missing)) end=last;
 if _n_=1 then call execute(catt("ods select none;
        ods output  FitStatistics=FitStatistics  ParameterEstimates= ParameterEstimates;
        proc reg data=&dsn. ;"));
 call execute(catt(compress(comb),": model &y.=",comb,";"));
 if last then call execute('quit; ods select all;');
run;
%mend;

%reg_all_comb(dsn=have, y=var0 , x=var1 var2 var3 )

View solution in original post

15 REPLIES 15
acordes
Rhodochrosite | Level 12
Assuming all variables are numeric.
Not tested. Try this as departure.

Proc glmselect data=have;
Model var0= var1 var2 var3 var1|var2|var3 @2 / solution selection=backward;
Run;
Satyakshma
Fluorite | Level 6
Thankyou for your response. But I have a question here, the selection =backward option will do variable selection or it will provide all combinations.
acordes
Rhodochrosite | Level 12
I suppose you can force it to run all steps.
Perhaps by setting a very extreme stopping criteria.
But I'm not sure and I'm writing from my smartphone so that I cannot test or investigate more.
whymath
Lapis Lazuli | Level 10

selection = option will do variable selection.

var1|var2|var3 is a short specification of a full factorial model:

var1 var2 var3 var1*var2 var1*var3 var2*var3 var1*var2*var3

So don't use @2 there, it means the maximum number of variables involved is 2, factor "var1*var2*var3" would be removed.

PaigeMiller
Diamond | Level 26

All possible regressions in SAS: https://support.sas.com/kb/24/986.html

 

 

--
Paige Miller
Ksharp
Super User

You want this ?

 

data have;
input var0 var1 var2 var3;
Datalines;
0.17 3.84 15.60 17.15
0.13 3.72 1.90 17.46
0.18 8.44 22.80 12.37
0.14 6.29 5.60 8.73
;
run;


%macro reg_all_comb(dsn=, y= , x= );
%let n=%sysfunc(countw(&x.,%str( ))); %put &=n ;
data all_comb;
length comb $ 200;
 array x{&n.};
 k=-1; 
 do i=1 to 2**&n.;
   rc=graycode(k,of x{*});
   do j=1 to &n.;
    if x{j}=1 then comb=catx(' ',comb,scan("&x.",j,' '));
   end;
   output;call missing(comb);
 end;
run;
data _null_;
 set all_comb(where=(comb is not missing)) end=last;
 if _n_=1 then call execute(catt("proc reg data=&dsn. ;"));
 call execute(catt(" model &y.=",comb,";"));
 if last then call execute('quit;');
run;
%mend;

%reg_all_comb(dsn=have, y=var0 , x=var1 var2 var3 )
Satyakshma
Fluorite | Level 6

Hi, Thank you for your response. but have few questions in the code. You have used 'Graycode' function in the code. Is it different from using 'Allcomb' function to get combinations.

Ksharp
Super User
There are different. ALLCOMB() only generate the combination of N variables. (a.k. C m n)
But GREYCODE() could generate the combination of 1 , 2, 3, 4, ........N variables. (a.k.a (1+1)^n )
Check documentation ,you would find out the difference.
Satyakshma
Fluorite | Level 6
Thanks, I am trying to add this line of code- call execute(catt("output out =","reg_op",";"));
As I need to generate the results of these regression (slope, intercept, r-square, adj-rsquare etc) in dataset. But I am getting error as 'Data work reg_op is already open for output'.

Is there some other way I can get the output in dataset.
Ksharp
Super User

OK. You want this ?

 

data have;
input var0 var1 var2 var3;
Datalines;
0.17 3.84 15.60 17.15
0.13 3.72 1.90 17.46
0.18 8.44 22.80 12.37
0.14 6.29 5.60 8.73
;
run;


%macro reg_all_comb(dsn=, y= , x= );
%let n=%sysfunc(countw(&x.,%str( ))); %put &=n ;
data all_comb;
length comb $ 200;
 array x{&n.};
 k=-1; 
 do i=1 to 2**&n.;
   rc=graycode(k,of x{*});
   do j=1 to &n.;
    if x{j}=1 then comb=catx(' ',comb,scan("&x.",j,' '));
   end;
   output;call missing(comb);
 end;
run;
data _null_;
 set all_comb(where=(comb is not missing)) end=last;
 if _n_=1 then call execute(catt("ods select none;
        ods output  FitStatistics=FitStatistics  ParameterEstimates= ParameterEstimates;
        proc reg data=&dsn. ;"));
 call execute(catt(compress(comb),": model &y.=",comb,";"));
 if last then call execute('quit; ods select all;');
run;
%mend;

%reg_all_comb(dsn=have, y=var0 , x=var1 var2 var3 )
StatsMan
SAS Super FREQ

If you use SELECTION=RSQUARE and STOP=3 on the MODEL statement in PROC REG, that should give you all possible regressions with up to 3 variables in the model. Try the code below. This code can be extremely resource intensive many variables listed on the MODEL statement. If you have 10 variables, then the code will yield c(10,1) + c(10,2) + c(10,3) = 10 + 45 + 120 = 175 regresions. If you run with 100 variables, then that is 100 + 450 + 161,700 = 162,250 models. If you run with 1000 variables, then thats ~166 million models. 

 

data test;
   call streaminit(51436);
   array x{10} x1-x10;
   do i=1 to 100;
      y=rand("normal");
	  do j=1 to 10;
	     x{j}=rand("normal");
	  end;
	  output;
   end;
run;

proc reg data=test outest=stats;
   model y=x1-x10 / selection=rsquare stop=3;
run;

proc print data=stats;
run;
Ksharp
Super User
Good .
But OP need more than that:
"slope, intercept, r-square, adj-rsquare etc "
StatsMan
SAS Super FREQ

The OUTEST= data set will contain the _rmse_  and _rsquare_ fields along with the intercept and slopes for each model. If you change to SELECTION=ADJRSQ and add the ADJRSQ option to the MODEL statement, then the OUTEST= data set will also contain the adjusted r-square. If there are other statistics needed, like Mallow's CP, then there are options on the MODEL statement to include those statistics in the OUTEST= data set. 

acordes
Rhodochrosite | Level 12

greetings from @Rick_SAS , who wrote a blog post about the SWEEP operator.

 

Do it with the sweep operator

 

data test;
   call streaminit(51436);
   array x{10} x1-x10;
   do i=1 to 100;
      y=rand("normal");
	  do j=1 to 10;
	     x{j}=rand("normal");
	  end;
	  output;
   end;
run;

ods graphics on;
%let numsim=10;

proc iml;
xVarNames = "X1":"X&numSim";     /* names of explanatory variables */
varNames = xVarNames || "y" ;     /* name of all data variables */
use test;  read all var varNames into M [colname=varyplus];  
close;
M = j(nrow(M), 1, 1) || M;       /* add intercept column */
varyplus="intercept" || varnames;
mattrib m c=varyplus;

tss=(m[, {"y"}]-(m[, {"y"}] [:])) [##];

model_vars="x1":"x10";

vars=10;
max_cross=3;

ncomb=0;

do t=1 to max_cross;
ncomb=ncomb + comb(vars, t);
end;

results=t(1:ncomb) || j(ncomb, max_cross + 2, .);
model_info=j(ncomb, max_cross, "                               ");

cnt=0;

do i=1 to max_cross;
idx=allcomb(vars, i)+1;
idx=j(nrow(idx),1,1)||idx;
do u=1 to nrow(idx);
S1 = sweep(M`*M, idx[u,]);
rss=((t(S1[idx[u,], ncol(m)])#m[,idx[u,]]) [,+] - m[, {"y"}]) [##] ;
rsq=1-rss/tss;
cnt=cnt+1;
results[cnt,2:i+2]=S1[idx[u,],nrow(s1)]`;
results[cnt, ncol(results)]=rsq;
model_info[cnt,1:ncol(idx[u,])-1]=model_vars[idx[u,2:ncol(idx)]-1]`;
end;
end;

call symputx("cross", max_cross);

call sortndx(rr, results, ncol(results));
results=results[rr,];
model_info=model_info[rr,];


names={"obs" "intercept"} || ("est1":"est&cross.") || {"_rsq_" "rank"} ;
names2="model_var1":"model_var&cross.";

create parameter_estimate from results [colname=names];
append from results;
close;

create model_var from model_info [colname=names2];
append from model_info;
close;

quit;

data final_result;
merge parameter_estimate model_var;
run;

 

imly.png

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 15 replies
  • 3357 views
  • 15 likes
  • 6 in conversation