turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- General Programming
- /
- Creating multiple variables by multiplying 3 at a ...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-05-2013 10:41 AM

Hi,

I have a dataset of 70 binary variables. Below is an illustration.

V1 | V 2 | V 3 | V4 | V5.... | V70 |
---|---|---|---|---|---|

1 | 1 | 1 | 1 | 1 | 0 |

1 | 0 | 1 | 0 | 1 | 0 |

0 | 1 | 1 | 0 | 1 | 0 |

1 | 1 | 0 | 0 | 1 | 0 |

0 | 0 | 0 | 1 | 1 | 1 |

0 | 1 | 0 | 0 | 0 | 1 |

I have to create new variables by multiplying 3 variables at a time. example: V1*V2*V3=W1, V2*V3*V4=W2 and so on.

I have to repeat this exercise for all possible combinations of these 70 variables.

Is there a code to do this quickly instead of writing all combinations and assigning variables to them manually?

Thanks,

Kabir

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-05-2013 10:58 AM

I'm sure others will come up with algorithms that will accomplish this, but I'm sort of curious as to why you would want to expand 70 variables to 54,740 variables. What is the advantage of doing this?

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-05-2013 11:12 AM

While I agree with Steve ... it is hard to picture a program that will use so many variables, creating them is not so difficult. This code could be part of a DATA step:

array v {70};

array new {54740};

newvar=0;

do _i_=1 to 68;

do _j_=_i_+1 to 69;

do _k_=_j_+1 to 70;

newvar + 1;

new{newvar} = v{_i_} * v{_j_} * v{_k_};

end;

end;

end;

If you post more about why you want these variables, you might get suggestions about alternative approaches.

Good luck.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-05-2013 11:28 AM

The reason to create these variables is to understand the 1on1 relationship (correlation) with another variable "Y" which is again binary. These individual variables v1 v2 v3 are factors of risk. I am trying to combine variables in pairs of 2 and 3 to understand the combination of risk factors which correlate to my contracts moving into HIGH RISK ENGAGEMENT list.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-05-2013 11:33 AM

If you want to fit a model, I agree with the other that having all possible three-way combinations of 70 variables is not a very sensible thing to do.

With this many variables, you will, strictly by random chance, find some that are correlated with Y.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-06-2013 03:52 PM

Think of it this way. To get stable estimates in a predictive model (say a logistic model), you need roughly 10 occurrences for each independent variable in the model. So with 70 variables, 700 occurrences out of however many observations you have to get a good model, whereas you would need 700 plus

24150 plus 547400 occurrences to get a good model with all first, second and third order terms. So unless you have nearly 600K occurences, your model fit will be worthless.

In any case, if you are modeling, you don't need to create the variables in a data step. Let the MODEL statement of whatever procedure you are using do it for you. See the syntax for the MODEL statement in PROC GLM for an example of how to include all interactions up to a certain level using the @ operator.

Steve Denham