I am working with a dataset which contains 24 dependent variables and 27 independent variables, the constraint is that I am to trial each dependent variable with 4 independent variables. I've been trying to figure out Proc Transreg on how to do this iteratively but I'm not having much luck so far.
the dataset looks like
data have;
input Year Ind1--Ind24 Dep1--Dep27;
datalines;
Year Ind1 ind2 ind3 ind4 ... dep1 dep2 dep3 ...
2014Q1 105 210 305 405 ... 10 20 40 30
2014Q2 10 20 30 40 ... 40 5 2 1
2014Q3 15 210 305 405 ... 10 20 40 30
2014Q4 10 20 05 405 ... 10 20 40 30
2015Q1 105 10 35 45 ... 30 20 40 50
2015Q2 105 21 3 405 ... 10 20 40 30
2015Q3 15 21 5 40 ... 50 20 70 30
2015Q4 5 210 35 5 ... 10 90 40 30
;
run;
the regressions I'm looking to trial would look like this:
Ind1 = Dep1 + Dep2 + Dep3 + Dep4
Ind1 = Dep1 + Dep3 + Dep4 + Dep5
...
Ind1 = Dep23 + Dep24 + Dep25 + Dep26
...
Ind5 = Dep4 + Dep5 + Dep10 + Dep17
...
essentially all the permutations possible
TIA.
I've managed to create a solution using the following macro:
%Macro Regression;
%let index = 1;
%do %until (%Scan(&Var2,&index," ")=);
%let Ind = %Scan(&Var2,&index," ");
proc reg data = quarterly;
model &Ind = &var / selection = stepwise;
quit;
%let index = %eval(&Index + 1);
%end;
%mend;
where Var1 and Var2 are the lists of 24 and 27 variables respectively
Can you show the transreg code for ONE of your models that works as desired?
There are likely several approaches but I think it would help to show a single case before attempting to generate 24* (27 choose 4) cases (looks like 421,200 model runs).
Which is likely to take a moderate amount of time and some decisions of what to keep and identify the output.
How do you want to keep the output? Which specific output to keep for each model? Show this in your transreg example.
I've managed to create a solution using the following macro:
%Macro Regression;
%let index = 1;
%do %until (%Scan(&Var2,&index," ")=);
%let Ind = %Scan(&Var2,&index," ");
proc reg data = quarterly;
model &Ind = &var / selection = stepwise;
quit;
%let index = %eval(&Index + 1);
%end;
%mend;
where Var1 and Var2 are the lists of 24 and 27 variables respectively
Please explain how that code is getting the groups of 4 independent variables from your original post:
the regressions I'm looking to trial would look like this:
Ind1 = Dep1 + Dep2 + Dep3 + Dep4 Ind1 = Dep1 + Dep3 + Dep4 + Dep5 ... Ind1 = Dep23 + Dep24 + Dep25 + Dep26 ... Ind5 = Dep4 + Dep5 + Dep10 + Dep17 ...
essentially all the permutations possible
Your posted code uses macro variables &var2 and &var, no &var1
If you are using Proc Reg you can actually have different MODEL statements though it is a good idea to include a Label for the model to key the output to the correct model.
This seems like it would be terribly time consuming to run as it will have to compute 24 * comb(27,4) = 421,200 regressions, and furthermore it seems like it is a very poor idea to begin with. And even if you wrote the code (a rather time consuming task itself) and completed this series of regressions, what do you do with the results?
Better would be some method that evaluates models using all 27 independent variables and their predictive ability on the 24 dependent variables, and is designed to handle any possible collinearity between the independent variables (which ordinary least squares regression does not do), and is designed to handle any collinearity between the dependent variables (which ordinary least squares regression does not do). What method is that? Drumroll please! That method is Partial Least Squares regression, or PROC PLS in SAS. In the ideal case, you fit ONE model (that's right, one) and then interpret and use the results. Even if you have to iterate and remove outliers or remove variables and run the model again, I'm sure the number of regressions will be less than 421,200, in fact I would be willing to guess fewer than 10 iterations ( << 421,200) would get you to the final result. In addition, PLS will find 5 or 6 or 7 variable models that predict better (if they exist) than any of your 4 variable models. Seems like a no-brainer to me.
Explanation for the code above:
Var is a list of all the independent variables
Var2 is a list of all the dependent variables
the regression runs for each dependent variable a stepwise regression where it selects the independent variable with the most explanatory power, then if the second most explanatory independent variable is statistically significant, using an F-test, then it adds it and continues, otherwise it stops.
this means that I'm not running the full 27 independent variables but it selects the best and i'm looking to add a cap of 4 independent variables maximum.
@89974114 wrote:
Explanation for the code above:
Var is a list of all the independent variables
Var2 is a list of all the dependent variables
the regression runs for each dependent variable a stepwise regression where it selects the independent variable with the most explanatory power, then if the second most explanatory independent variable is statistically significant, using an F-test, then it adds it and continues, otherwise it stops.
this means that I'm not running the full 27 independent variables but it selects the best and i'm looking to add a cap of 4 independent variables maximum.
As I said, I would not advise this. I think better solutions exist.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.