Hello,
I am trying to make a macro that will allow me to do linear regressions by changing the independent and dependent variables.
proc reg data=mydata;
model y1=x1;
run; 
Thank you!
You'd only need to use a couple of %do loops. You'll have to set how high you want the value of y to go. Not knowing, I used 9. Also, I haven't tested the following .. thus no guarantees on whether it's correct.
%macro doit;
  %do y=1 %to 9;
    %do x=1 %to 9;
      proc reg data=mydata;
        model y&y.=x&x.;
      run;
    %end;
  %end;
%mend;
%doit
Art, CEO, AnalystFinder.com
You'd only need to use a couple of %do loops. You'll have to set how high you want the value of y to go. Not knowing, I used 9. Also, I haven't tested the following .. thus no guarantees on whether it's correct.
%macro doit;
  %do y=1 %to 9;
    %do x=1 %to 9;
      proc reg data=mydata;
        model y&y.=x&x.;
      run;
    %end;
  %end;
%mend;
%doit
Art, CEO, AnalystFinder.com
https://blogs.sas.com/content/iml/2017/02/13/run-1000-regressions.html
An easy way to run 1000's of regressions in SAS - without macros and it makes it easier to parse the output in this manner because it's all together especially any output tables.
@Reeza . I don't often get the opportunity to disagree with @Rick_SAS, so I had to disagree this time. Not regarding getting output that's easier to parse, but regarding processing times of macros vs byvar approaches.
Yes, sometimes byvar processing can vastly reduce runtimes, but not for a simple regression like @akf06 asked about.
I ran the following code to compare the two approaches. The run times were very similar for both 100 and 1000 cases (just under a minute for N=100). And, since the byvar approach required one to write more code than the macro approach, I still vote for the macro approach (in this case).
/* Create wide data with variables Y1, y2, y3 ..., X1, X2, X3, ....*/
%let xCont = 9;    /* <== Specify the number of continuous ind variables */
%let yCont = 9;    /* <== Specify the number of continuous dep variables */
%let N = 100;         /* Specify sample size */
data Wide(keep= y: x:);
  call streaminit(54321);              /* set the random number seed */
  array x[&xCont];         /* explanatory vars are named x1-x&nCont  */
  array y[&yCont];         /* dependent vars are named y1-y&yCont  */
  do i = 1 to &N;              /* for each observation in the sample  */
    do j = 1 to dim(x);
      x[j] = rand("Normal"); /* 2. Simulate explanatory variables   */
    end;
    do j = 1 to dim(y);
      y[j] = rand("Normal"); /* 2. Simulate dependent variables   */
    end;
    output;
  end;
run;
/****** macro approach ******/
%macro RunReg;
 
  %do i = 1 %to 9;         /* repeat for each x&i */
    %do j = 1 %to 9;         /* repeat for each y&i */
      proc reg data=wide;
        model Y&j. = x&i;               /* model Y_j = x_i */
      quit;
    %end;
  %end;
%mend;
%RunReg
/* byvar approach */
/* 1. make file long */
data Long (keep=ind dep byvar);
  set Wide;
  array x [*] x1-x&nCont;
  array y [*] y1-y&yCont;
  do xNum = 1 to dim(x);
    ind = x[xNum];
    do yNum = 1 to dim(y);
      dep = y[yNum];
      byvar=catt(xNum,yNum);
      output;
    end;
  end;
run;
/* 2. Sort by BY-group variable */
proc sort data=long;
  by byvar;
run;
/* 3. Call PROC REG and use BY statement to compute all regressions */
proc reg data=Long;
  by byvar;
  model dep = ind;
quit;
Art, CEO, AnalystFinder.com
I haven't done any testing, but it would seem to me that there is an even faster approach, where for each X, you invert the matrix only once, and you put all the Y variables on the left hand side of the equal sign in the MODEL statement. Now, if you have a data set of 20 observations, this may not save anything but a trivial amount of time, but if your data set has 2 million observations, I'm sure it will save time.
%macro doit;
    %do x=1 %to 9;
      proc reg data=mydata;
        model %do y=1 %to 9; y&y. %end;=x&x.;
      run;
    %end;
%mend;
%doitNaturally, I add in my usual disclaimer: I am not recommending this approach of running multiple regressions, there are better approaches to modeling that don't involve running many many many PROC REGs when you have many X variables and many Y variables, specifically (but not limited to) Partial Least Squares regression. Just because you CAN do it this way doesn't mean you SHOULD do it this way.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
