BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
89974114
Quartz | Level 8

I am working with a dataset which contains 24 dependent variables and 27 independent variables, the constraint is that I am to trial each dependent variable with 4 independent variables. I've been trying to figure out Proc Transreg on how to do this iteratively but I'm not having much luck so far.

the dataset looks like

data have;
input Year Ind1--Ind24 Dep1--Dep27;
datalines;

Year   Ind1 ind2 ind3 ind4 ... dep1 dep2 dep3 ...

2014Q1 105 210 305 405 ... 10 20 40 30
2014Q2 10 20 30 40 ... 40 5 2 1
2014Q3 15 210 305 405 ... 10 20 40 30
2014Q4 10 20 05 405 ... 10 20 40 30
2015Q1 105 10 35 45 ... 30 20 40 50
2015Q2 105 21 3 405 ... 10 20 40 30
2015Q3 15 21 5 40 ... 50 20 70 30
2015Q4 5 210 35 5 ... 10 90 40 30
;
run;

the regressions I'm looking to trial would look like this:

Ind1 = Dep1 + Dep2 + Dep3 + Dep4
Ind1 = Dep1 + Dep3 + Dep4 + Dep5
...
Ind1 = Dep23 + Dep24 + Dep25 + Dep26
...
Ind5 = Dep4 + Dep5 + Dep10 + Dep17
...

essentially all the permutations possible

TIA.

1 ACCEPTED SOLUTION

Accepted Solutions
89974114
Quartz | Level 8

I've managed to create a solution using the following macro:

%Macro Regression;

%let index = 1;

%do %until (%Scan(&Var2,&index," ")=);

%let Ind = %Scan(&Var2,&index," ");

proc reg data = quarterly;

model &Ind = &var / selection = stepwise;

quit;

%let index = %eval(&Index + 1);

%end;

%mend;

where Var1 and Var2 are the lists of 24 and 27 variables respectively

 

View solution in original post

10 REPLIES 10
ballardw
Super User

Can you show the transreg code for ONE of your models that works as desired?

There are likely several approaches but I think it would help to show a single case before attempting to generate 24* (27 choose 4) cases (looks like 421,200 model runs).

 

Which is likely to take a moderate amount of time and some decisions of what to keep and identify the output.

How do you want to keep the output? Which specific output to keep for each model? Show this in your transreg example.

89974114
Quartz | Level 8

I've managed to create a solution using the following macro:

%Macro Regression;

%let index = 1;

%do %until (%Scan(&Var2,&index," ")=);

%let Ind = %Scan(&Var2,&index," ");

proc reg data = quarterly;

model &Ind = &var / selection = stepwise;

quit;

%let index = %eval(&Index + 1);

%end;

%mend;

where Var1 and Var2 are the lists of 24 and 27 variables respectively

 

ballardw
Super User

Please explain how that code is getting the groups of 4 independent variables from your original post:

the regressions I'm looking to trial would look like this:

Ind1 = Dep1 + Dep2 + Dep3 + Dep4
Ind1 = Dep1 + Dep3 + Dep4 + Dep5
...
Ind1 = Dep23 + Dep24 + Dep25 + Dep26
...
Ind5 = Dep4 + Dep5 + Dep10 + Dep17
...

essentially all the permutations possible

 

 

Your posted code uses macro variables &var2 and &var, no &var1

 

If you are using Proc Reg you can actually have different MODEL statements though it is a good idea to include a Label for the model to key the output to the correct model.

PaigeMiller
Diamond | Level 26

This seems like it would be terribly time consuming to run as it will have to compute 24 * comb(27,4) = 421,200 regressions, and furthermore it seems like it is a very poor idea to begin with. And even if you wrote the code (a rather time consuming task itself) and completed this series of regressions, what do you do with the results?

 

Better would be some method that evaluates models using all 27 independent variables and their predictive ability on the 24 dependent variables, and is designed to handle any possible collinearity between the independent variables (which ordinary least squares regression does not do), and is designed to handle any collinearity between the dependent variables (which ordinary least squares regression does not do). What method is that? Drumroll please! That method is Partial Least Squares regression, or PROC PLS in SAS. In the ideal case, you fit ONE model (that's right, one) and then interpret and use the results. Even if you have to iterate and remove outliers or remove variables and run the model again, I'm sure the number of regressions will be less than 421,200, in fact I would be willing to guess fewer than 10 iterations ( << 421,200) would get you to the final result. In addition, PLS will find 5 or 6 or 7 variable models that predict better (if they exist) than any of your 4 variable models. Seems like a no-brainer to me.

 

 

--
Paige Miller
89974114
Quartz | Level 8

Explanation for the code above:

Var is a list of all the independent variables

 

Var2 is a list of all the dependent variables

 

the regression runs for each dependent variable a stepwise regression where it selects the independent variable with the most explanatory power, then if the second most explanatory independent variable is statistically significant, using an F-test, then it adds it and continues, otherwise it stops.

 

this means that I'm not running the full 27 independent variables but it selects the best and i'm looking to add a cap of 4 independent variables maximum.

PaigeMiller
Diamond | Level 26

@89974114 wrote:

Explanation for the code above:

Var is a list of all the independent variables

 

Var2 is a list of all the dependent variables

 

the regression runs for each dependent variable a stepwise regression where it selects the independent variable with the most explanatory power, then if the second most explanatory independent variable is statistically significant, using an F-test, then it adds it and continues, otherwise it stops.

 

this means that I'm not running the full 27 independent variables but it selects the best and i'm looking to add a cap of 4 independent variables maximum.


As I said, I would not advise this. I think better solutions exist.

--
Paige Miller
89974114
Quartz | Level 8
The code runs in <5 seconds
89974114
Quartz | Level 8
The resulting code with three variables are then taken to be discussed in which to carry forward for analysis based on R^2 and if the variables make economic sense (human input)
89974114
Quartz | Level 8
only 27 regressions are run in my code so far, each with 3 independent variables

but I will try out proc pls
89974114
Quartz | Level 8
we have put a cap on the number of independent variables on 3 , possibly 4

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 3199 views
  • 0 likes
  • 3 in conversation