- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have a dataset with 3 independent variables (var1 - var3) and one dependent variable(var0). I need to run regression for all possible combinations of 3 variables.
data have;
input var0 var1 var2 var3;
Datalines;
0.17 3.84 15.60 17.15
0.13 3.72 1.90 17.46
0.18 8.44 22.80 12.37
0.14 6.29 5.60 8.73
;
run;
regression to be run for combinations
var1
var2
var3
var1 var2
var2 var3
and so on..
Thanks.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
OK. You want this ?
data have;
input var0 var1 var2 var3;
Datalines;
0.17 3.84 15.60 17.15
0.13 3.72 1.90 17.46
0.18 8.44 22.80 12.37
0.14 6.29 5.60 8.73
;
run;
%macro reg_all_comb(dsn=, y= , x= );
%let n=%sysfunc(countw(&x.,%str( ))); %put &=n ;
data all_comb;
length comb $ 200;
array x{&n.};
k=-1;
do i=1 to 2**&n.;
rc=graycode(k,of x{*});
do j=1 to &n.;
if x{j}=1 then comb=catx(' ',comb,scan("&x.",j,' '));
end;
output;call missing(comb);
end;
run;
data _null_;
set all_comb(where=(comb is not missing)) end=last;
if _n_=1 then call execute(catt("ods select none;
ods output FitStatistics=FitStatistics ParameterEstimates= ParameterEstimates;
proc reg data=&dsn. ;"));
call execute(catt(compress(comb),": model &y.=",comb,";"));
if last then call execute('quit; ods select all;');
run;
%mend;
%reg_all_comb(dsn=have, y=var0 , x=var1 var2 var3 )
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Not tested. Try this as departure.
Proc glmselect data=have;
Model var0= var1 var2 var3 var1|var2|var3 @2 / solution selection=backward;
Run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Perhaps by setting a very extreme stopping criteria.
But I'm not sure and I'm writing from my smartphone so that I cannot test or investigate more.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
selection = option will do variable selection.
var1|var2|var3 is a short specification of a full factorial model:
var1 var2 var3 var1*var2 var1*var3 var2*var3 var1*var2*var3
So don't use @2 there, it means the maximum number of variables involved is 2, factor "var1*var2*var3" would be removed.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You want this ?
data have;
input var0 var1 var2 var3;
Datalines;
0.17 3.84 15.60 17.15
0.13 3.72 1.90 17.46
0.18 8.44 22.80 12.37
0.14 6.29 5.60 8.73
;
run;
%macro reg_all_comb(dsn=, y= , x= );
%let n=%sysfunc(countw(&x.,%str( ))); %put &=n ;
data all_comb;
length comb $ 200;
array x{&n.};
k=-1;
do i=1 to 2**&n.;
rc=graycode(k,of x{*});
do j=1 to &n.;
if x{j}=1 then comb=catx(' ',comb,scan("&x.",j,' '));
end;
output;call missing(comb);
end;
run;
data _null_;
set all_comb(where=(comb is not missing)) end=last;
if _n_=1 then call execute(catt("proc reg data=&dsn. ;"));
call execute(catt(" model &y.=",comb,";"));
if last then call execute('quit;');
run;
%mend;
%reg_all_comb(dsn=have, y=var0 , x=var1 var2 var3 )
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi, Thank you for your response. but have few questions in the code. You have used 'Graycode' function in the code. Is it different from using 'Allcomb' function to get combinations.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
But GREYCODE() could generate the combination of 1 , 2, 3, 4, ........N variables. (a.k.a (1+1)^n )
Check documentation ,you would find out the difference.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
As I need to generate the results of these regression (slope, intercept, r-square, adj-rsquare etc) in dataset. But I am getting error as 'Data work reg_op is already open for output'.
Is there some other way I can get the output in dataset.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
OK. You want this ?
data have;
input var0 var1 var2 var3;
Datalines;
0.17 3.84 15.60 17.15
0.13 3.72 1.90 17.46
0.18 8.44 22.80 12.37
0.14 6.29 5.60 8.73
;
run;
%macro reg_all_comb(dsn=, y= , x= );
%let n=%sysfunc(countw(&x.,%str( ))); %put &=n ;
data all_comb;
length comb $ 200;
array x{&n.};
k=-1;
do i=1 to 2**&n.;
rc=graycode(k,of x{*});
do j=1 to &n.;
if x{j}=1 then comb=catx(' ',comb,scan("&x.",j,' '));
end;
output;call missing(comb);
end;
run;
data _null_;
set all_comb(where=(comb is not missing)) end=last;
if _n_=1 then call execute(catt("ods select none;
ods output FitStatistics=FitStatistics ParameterEstimates= ParameterEstimates;
proc reg data=&dsn. ;"));
call execute(catt(compress(comb),": model &y.=",comb,";"));
if last then call execute('quit; ods select all;');
run;
%mend;
%reg_all_comb(dsn=have, y=var0 , x=var1 var2 var3 )
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If you use SELECTION=RSQUARE and STOP=3 on the MODEL statement in PROC REG, that should give you all possible regressions with up to 3 variables in the model. Try the code below. This code can be extremely resource intensive many variables listed on the MODEL statement. If you have 10 variables, then the code will yield c(10,1) + c(10,2) + c(10,3) = 10 + 45 + 120 = 175 regresions. If you run with 100 variables, then that is 100 + 450 + 161,700 = 162,250 models. If you run with 1000 variables, then thats ~166 million models.
data test; call streaminit(51436); array x{10} x1-x10; do i=1 to 100; y=rand("normal"); do j=1 to 10; x{j}=rand("normal"); end; output; end; run; proc reg data=test outest=stats; model y=x1-x10 / selection=rsquare stop=3; run; proc print data=stats; run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
But OP need more than that:
"slope, intercept, r-square, adj-rsquare etc "
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The OUTEST= data set will contain the _rmse_ and _rsquare_ fields along with the intercept and slopes for each model. If you change to SELECTION=ADJRSQ and add the ADJRSQ option to the MODEL statement, then the OUTEST= data set will also contain the adjusted r-square. If there are other statistics needed, like Mallow's CP, then there are options on the MODEL statement to include those statistics in the OUTEST= data set.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
greetings from @Rick_SAS , who wrote a blog post about the SWEEP operator.
Do it with the sweep operator
data test;
call streaminit(51436);
array x{10} x1-x10;
do i=1 to 100;
y=rand("normal");
do j=1 to 10;
x{j}=rand("normal");
end;
output;
end;
run;
ods graphics on;
%let numsim=10;
proc iml;
xVarNames = "X1":"X&numSim"; /* names of explanatory variables */
varNames = xVarNames || "y" ; /* name of all data variables */
use test; read all var varNames into M [colname=varyplus];
close;
M = j(nrow(M), 1, 1) || M; /* add intercept column */
varyplus="intercept" || varnames;
mattrib m c=varyplus;
tss=(m[, {"y"}]-(m[, {"y"}] [:])) [##];
model_vars="x1":"x10";
vars=10;
max_cross=3;
ncomb=0;
do t=1 to max_cross;
ncomb=ncomb + comb(vars, t);
end;
results=t(1:ncomb) || j(ncomb, max_cross + 2, .);
model_info=j(ncomb, max_cross, " ");
cnt=0;
do i=1 to max_cross;
idx=allcomb(vars, i)+1;
idx=j(nrow(idx),1,1)||idx;
do u=1 to nrow(idx);
S1 = sweep(M`*M, idx[u,]);
rss=((t(S1[idx[u,], ncol(m)])#m[,idx[u,]]) [,+] - m[, {"y"}]) [##] ;
rsq=1-rss/tss;
cnt=cnt+1;
results[cnt,2:i+2]=S1[idx[u,],nrow(s1)]`;
results[cnt, ncol(results)]=rsq;
model_info[cnt,1:ncol(idx[u,])-1]=model_vars[idx[u,2:ncol(idx)]-1]`;
end;
end;
call symputx("cross", max_cross);
call sortndx(rr, results, ncol(results));
results=results[rr,];
model_info=model_info[rr,];
names={"obs" "intercept"} || ("est1":"est&cross.") || {"_rsq_" "rank"} ;
names2="model_var1":"model_var&cross.";
create parameter_estimate from results [colname=names];
append from results;
close;
create model_var from model_info [colname=names2];
append from model_info;
close;
quit;
data final_result;
merge parameter_estimate model_var;
run;