- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
If I have data with variable x1, x2, x3, and I need to find all the possible combination of 3 (there are 7 ). Is there any elegant way to do this instead of list all the combination? Because I might end up with 5 or 10 variables.
data test;
input x1 x2 x3;
datalines;
1 2 3
3 4 5
;
run;
data all;
set test;
/***is there any neat way to do this combination, instead of hard coded all the combinations?***/
comb1 = x1;
comb2 = x2;
comb3 = x3;
comb12 = sum(x1, x2);
comb13 = sum(x1, x3);
comb23 = sum(x2, x3);
comb123 = sum(x1, x2, x3);
run;
thank you!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes, you can do this, probably in several ways. The following generates all binary sequences between 1 and 2**3-1 = 7:
001
010
011
,,,
111
The sum that you want is coef[1]*x1 + coef[2]*x2 + coef[3]*x3, where coef[i] is the i_th element of the binary sequence (thought of as a number). You can use the SUBSTR function is used to extract the i_th coefficient.
data all;
set test;
array x[3] x1-x3;
array comb[7] c1-c7;
label c1="comb3" c2="comb2" c3="comb23"
c4="comb1" c5="comb13" c6="comb12" c7="comb123";
n = dim(x);
do i = 1 to 2**n - 1; /* count 1 to 2**n - 1 */
s = put(i, Binary3.); /* generate binary representation */
sum = 0;
do k = 1 to n; /* sum = SUM of s[k]*x[k] */
coef = substr(s,k,1);
sum + coef*x[k];
end;
comb[i] = sum; /* store sum for this combination of variables */
end;
output;
keep x1-x3 c1-c7;
run;
proc print label;
var x1-x3 c4 c2 c1 c6 c5 c3 c7;
run;
Notice that my algorithm does not generate the values in the same order as you did. (You computed sums of variables taken one at a time, taken two at a time, and taken three at a time.) If necessary, I'm sure others on the list can modify my idea to give the results in a different order. To help compare my approach with your, I assigned labels to each variable that agree with your variable names.
The algorithm is explained further in the article "Creating a matrix with all combinations of zeros and ones."
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Have you looked at the ALLCOMB function? @Rick_SAS also has a nice series of articles about generating combinations in SAS.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes, you can do this, probably in several ways. The following generates all binary sequences between 1 and 2**3-1 = 7:
001
010
011
,,,
111
The sum that you want is coef[1]*x1 + coef[2]*x2 + coef[3]*x3, where coef[i] is the i_th element of the binary sequence (thought of as a number). You can use the SUBSTR function is used to extract the i_th coefficient.
data all;
set test;
array x[3] x1-x3;
array comb[7] c1-c7;
label c1="comb3" c2="comb2" c3="comb23"
c4="comb1" c5="comb13" c6="comb12" c7="comb123";
n = dim(x);
do i = 1 to 2**n - 1; /* count 1 to 2**n - 1 */
s = put(i, Binary3.); /* generate binary representation */
sum = 0;
do k = 1 to n; /* sum = SUM of s[k]*x[k] */
coef = substr(s,k,1);
sum + coef*x[k];
end;
comb[i] = sum; /* store sum for this combination of variables */
end;
output;
keep x1-x3 c1-c7;
run;
proc print label;
var x1-x3 c4 c2 c1 c6 c5 c3 c7;
run;
Notice that my algorithm does not generate the values in the same order as you did. (You computed sums of variables taken one at a time, taken two at a time, and taken three at a time.) If necessary, I'm sure others on the list can modify my idea to give the results in a different order. To help compare my approach with your, I assigned labels to each variable that agree with your variable names.
The algorithm is explained further in the article "Creating a matrix with all combinations of zeros and ones."
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you, RicK!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I know this post was some time ago, but I found in useful in working out how to do a similar thing but with more variables. The thing I'd like to be able to do but I can't think how is to know which of the combinations were 1, 2, 3 etc factor combinations - any suggestions of where to start looking for an answer.
Jo
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I assume you figured out how to adjust the dimensions of the arrays and the width of the BINARYw. format., The important change is to add S to the KEEP statement. The variable S is a binary string that indicates which variables are in the combination. Thus '10011' represents x1, x2, and x5.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
But continuing that line of thought I created a new table with the number of combinations and their BINARYw number (only the 1's)
data new;
do i = 1 to 29;
s = compress(put(i,Binary5.),"0");
output;
I needed to transpose the data anyway so the Combinations (columns) become a variable and then I was able to join the two tables together so instead of having Col1 - Col29 I returned the binary tag created in the new table 1, 11, 111 etc which identified them.
It is a bit of mucking around but I couldn't think of another way to do it. I'm not sure if I should have been able to label the column headings with the new table results somehow.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have been able to adjust this code to use when I have larger values of n but this gets out of hand quickly. Can you give a clue on how to alter this code to only look at combinations of a set number of values. Eg I might have 5 columns but I don't want to look at all possible 31 combinations but only the combinations that have 3 sample points in each combination = 10 combinations.
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The easiest way is to use the ALLCOMB function, as @ChrisHemedinger suggested. See the references in his post. If you want to stay with the binary approach, restrict your attention to strings that have exactly three 1s. If you need further help, I suggest you open a new thread, explain your problem, and post the code that you are using.