Creating multiple variables by multiplying 3 at a time for all possible combinations of 70 variables

Reply
New Contributor
Posts: 4

Creating multiple variables by multiplying 3 at a time for all possible combinations of 70 variables

Hi,

I have a dataset of 70 binary variables. Below is an illustration.

V1V 2V 3V4V5....V70
111110
101010
011010
110010
000111
010001

I have to create new variables by multiplying 3 variables at a time. example: V1*V2*V3=W1, V2*V3*V4=W2  and so on.

I have to repeat this exercise for all possible combinations of these 70 variables.

Is there a code to do this quickly instead of writing all combinations and assigning variables to them manually?

Thanks,

Kabir

Respected Advisor
Posts: 2,655

Re: Creating multiple variables by multiplying 3 at a time for all possible combinations of 70 variables

I'm sure others will come up with algorithms that will accomplish this, but I'm sort of curious as to why you would want to expand 70 variables to 54,740 variables.  What is the advantage of doing this?

Steve Denham

Super User
Posts: 5,082

Re: Creating multiple variables by multiplying 3 at a time for all possible combinations of 70 variables

While I agree with Steve ... it is hard to picture a program that will use so many variables, creating them is not so difficult.  This code could be part of a DATA step:

array v {70};

array new {54740};

newvar=0;

do _i_=1 to 68;

    do _j_=_i_+1 to 69;

         do _k_=_j_+1 to 70;
              newvar + 1;

              new{newvar} = v{_i_} * v{_j_} * v{_k_};

        end;

    end;

end;

If you post more about why you want these variables,  you might get suggestions about alternative approaches.

Good luck.

New Contributor
Posts: 4

Re: Creating multiple variables by multiplying 3 at a time for all possible combinations of 70 variables

The reason to create these variables is to understand the 1on1 relationship (correlation) with another variable "Y" which is again binary. These individual variables v1 v2 v3 are factors of risk. I am trying to combine variables in pairs of 2 and 3 to understand the combination of risk factors which correlate to my contracts moving into HIGH RISK ENGAGEMENT list.

Trusted Advisor
Posts: 1,615

Re: Creating multiple variables by multiplying 3 at a time for all possible combinations of 70 variables

If you want to fit a model, I agree with the other that having all possible three-way combinations of 70 variables is not a very sensible thing to do.

With this many variables, you will, strictly by random chance, find some that are correlated with Y.

Respected Advisor
Posts: 2,655

Re: Creating multiple variables by multiplying 3 at a time for all possible combinations of 70 variables

Think of it this way.  To get stable estimates in a predictive model (say a logistic model), you need roughly 10 occurrences for each independent variable in the model.  So with 70 variables, 700 occurrences out of however many observations you have to get a good model, whereas you would need 700 plus
24150 plus 547400 occurrences to get a good model with all first, second and third order terms.  So unless you have nearly 600K occurences, your model fit will be worthless.

In any case, if you are modeling, you don't need to create the variables in a data step.  Let the MODEL statement of whatever procedure you are using do it for you.  See the syntax for the MODEL statement in PROC GLM for an example of how to include all interactions up to a certain level using the @ operator.

Steve Denham

Ask a Question
Discussion stats
  • 5 replies
  • 229 views
  • 0 likes
  • 4 in conversation