Solved: Re: SAS: Proc TransReg

Report Inappropriate Content · Posted 04-10-2018 09:01 AM

I am working with a dataset which contains 24 dependent variables and 27 independent variables, the constraint is that I am to trial each dependent variable with 4 independent variables. I've been trying to figure out Proc Transreg on how to do this iteratively but I'm not having much luck so far.

the dataset looks like

data have;
input Year Ind1--Ind24 Dep1--Dep27;
datalines;

Year   Ind1 ind2 ind3 ind4 ... dep1 dep2 dep3 ...

2014Q1 105 210 305 405 ... 10 20 40 30
2014Q2 10 20 30 40 ... 40 5 2 1
2014Q3 15 210 305 405 ... 10 20 40 30
2014Q4 10 20 05 405 ... 10 20 40 30
2015Q1 105 10 35 45 ... 30 20 40 50
2015Q2 105 21 3 405 ... 10 20 40 30
2015Q3 15 21 5 40 ... 50 20 70 30
2015Q4 5 210 35 5 ... 10 90 40 30
;
run;

the regressions I'm looking to trial would look like this:

Ind1 = Dep1 + Dep2 + Dep3 + Dep4
Ind1 = Dep1 + Dep3 + Dep4 + Dep5
...
Ind1 = Dep23 + Dep24 + Dep25 + Dep26
...
Ind5 = Dep4 + Dep5 + Dep10 + Dep17
...

essentially all the permutations possible

TIA.

Report Inappropriate Content · Posted 04-10-2018 12:25 PM

I've managed to create a solution using the following macro:

%Macro Regression;

%let index = 1;

%do %until (%Scan(&Var2,&index," ")=);

%let Ind = %Scan(&Var2,&index," ");

proc reg data = quarterly;

model &Ind = &var / selection = stepwise;

quit;

%let index = %eval(&Index + 1);

%end;

%mend;

where Var1 and Var2 are the lists of 24 and 27 variables respectively

View solution in original post

ballardw · Posted 04-10-2018 12:23 PM

Can you show the transreg code for ONE of your models that works as desired?

There are likely several approaches but I think it would help to show a single case before attempting to generate 24* (27 choose 4) cases (looks like 421,200 model runs).

Which is likely to take a moderate amount of time and some decisions of what to keep and identify the output.

How do you want to keep the output? Which specific output to keep for each model? Show this in your transreg example.

Report Inappropriate Content · Posted 04-10-2018 12:25 PM

I've managed to create a solution using the following macro:

%Macro Regression;

%let index = 1;

%do %until (%Scan(&Var2,&index," ")=);

%let Ind = %Scan(&Var2,&index," ");

proc reg data = quarterly;

model &Ind = &var / selection = stepwise;

quit;

%let index = %eval(&Index + 1);

%end;

%mend;

where Var1 and Var2 are the lists of 24 and 27 variables respectively

ballardw · Posted 04-10-2018 02:40 PM

Please explain how that code is getting the groups of 4 independent variables from your original post:

the regressions I'm looking to trial would look like this:
Ind1 = Dep1 + Dep2 + Dep3 + Dep4
Ind1 = Dep1 + Dep3 + Dep4 + Dep5
...
Ind1 = Dep23 + Dep24 + Dep25 + Dep26
...
Ind5 = Dep4 + Dep5 + Dep10 + Dep17
...
essentially all the permutations possible

Your posted code uses macro variables &var2 and &var, no &var1

If you are using Proc Reg you can actually have different MODEL statements though it is a good idea to include a Label for the model to key the output to the correct model.

PaigeMiller · Posted 04-10-2018 02:51 PM

This seems like it would be terribly time consuming to run as it will have to compute 24 * comb(27,4) = 421,200 regressions, and furthermore it seems like it is a very poor idea to begin with. And even if you wrote the code (a rather time consuming task itself) and completed this series of regressions, what do you do with the results?

Better would be some method that evaluates models using all 27 independent variables and their predictive ability on the 24 dependent variables, and is designed to handle any possible collinearity between the independent variables (which ordinary least squares regression does not do), and is designed to handle any collinearity between the dependent variables (which ordinary least squares regression does not do). What method is that? Drumroll please! That method is Partial Least Squares regression, or PROC PLS in SAS. In the ideal case, you fit ONE model (that's right, one) and then interpret and use the results. Even if you have to iterate and remove outliers or remove variables and run the model again, I'm sure the number of regressions will be less than 421,200, in fact I would be willing to guess fewer than 10 iterations ( << 421,200) would get you to the final result. In addition, PLS will find 5 or 6 or 7 variable models that predict better (if they exist) than any of your 4 variable models. Seems like a no-brainer to me.

--
Paige Miller

Report Inappropriate Content · Posted 04-11-2018 02:21 AM

Explanation for the code above:

Var is a list of all the independent variables

Var2 is a list of all the dependent variables

the regression runs for each dependent variable a stepwise regression where it selects the independent variable with the most explanatory power, then if the second most explanatory independent variable is statistically significant, using an F-test, then it adds it and continues, otherwise it stops.

this means that I'm not running the full 27 independent variables but it selects the best and i'm looking to add a cap of 4 independent variables maximum.

PaigeMiller · Posted 04-11-2018 08:18 AM

@89974114 wrote:

Explanation for the code above:

Var is a list of all the independent variables

Var2 is a list of all the dependent variables

the regression runs for each dependent variable a stepwise regression where it selects the independent variable with the most explanatory power, then if the second most explanatory independent variable is statistically significant, using an F-test, then it adds it and continues, otherwise it stops.

this means that I'm not running the full 27 independent variables but it selects the best and i'm looking to add a cap of 4 independent variables maximum.

As I said, I would not advise this. I think better solutions exist.

--
Paige Miller

Report Inappropriate Content · Posted 04-11-2018 08:21 AM

The code runs in <5 seconds

Report Inappropriate Content · Posted 04-11-2018 08:22 AM

The resulting code with three variables are then taken to be discussed in which to carry forward for analysis based on R^2 and if the variables make economic sense (human input)

Report Inappropriate Content · Posted 04-11-2018 08:23 AM

only 27 regressions are run in my code so far, each with 3 independent variables

but I will try out proc pls

Report Inappropriate Content · Posted 04-11-2018 08:24 AM

we have put a cap on the number of independent variables on 3 , possibly 4

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away