SAS Programming

Ying · Posted 06-17-2016 11:52 AM

Hello,

If I have data with variable x1, x2, x3, and I need to find all the possible combination of 3 (there are 7 ). Is there any elegant way to do this instead of list all the combination? Because I might end up with 5 or 10 variables.

data test;

input x1 x2 x3;

datalines;

1 2 3

3 4 5

;

run;

data all;

set test;

/***is there any neat way to do this combination, instead of hard coded all the combinations?***/

comb1 = x1;

comb2 = x2;

comb3 = x3;

comb12 = sum(x1, x2);

comb13 = sum(x1, x3);

comb23 = sum(x2, x3);

comb123 = sum(x1, x2, x3);

run;

thank you!

Rick_SAS · Posted 06-17-2016 01:53 PM

Yes, you can do this, probably in several ways. The following generates all binary sequences between 1 and 2**3-1 = 7:

001

010

011

,,,

111

The sum that you want is coef[1]*x1 + coef[2]*x2 + coef[3]*x3, where coef[i] is the i_th element of the binary sequence (thought of as a number). You can use the SUBSTR function is used to extract the i_th coefficient.

data all;
set test;
array x[3] x1-x3;
array comb[7] c1-c7;
label c1="comb3" c2="comb2" c3="comb23"
      c4="comb1" c5="comb13" c6="comb12" c7="comb123";
n = dim(x);
do i = 1 to 2**n - 1;         /* count 1 to 2**n - 1 */
   s = put(i, Binary3.);      /* generate binary representation */
   sum = 0;
   do k = 1 to n;             /* sum = SUM of s[k]*x[k] */
      coef = substr(s,k,1);
      sum + coef*x[k];
   end;
   comb[i] = sum;             /* store sum for this combination of variables */
end;   
output;
keep x1-x3 c1-c7;
run;
proc print label; 
var x1-x3 c4 c2 c1 c6 c5 c3 c7;
run;

Notice that my algorithm does not generate the values in the same order as you did. (You computed sums of variables taken one at a time, taken two at a time, and taken three at a time.) If necessary, I'm sure others on the list can modify my idea to give the results in a different order. To help compare my approach with your, I assigned labels to each variable that agree with your variable names.

The algorithm is explained further in the article "Creating a matrix with all combinations of zeros and ones."

View solution in original post

ChrisHemedinger · Posted 06-17-2016 01:09 PM

Have you looked at the ALLCOMB function? @Rick_SAS also has a nice series of articles about generating combinations in SAS.

Register for SAS Innovate 2025!! The premier event for SAS users, May 6-9 in Orlando FL. Sign up now for the best deals!

Rick_SAS · Posted 06-17-2016 01:53 PM

Yes, you can do this, probably in several ways. The following generates all binary sequences between 1 and 2**3-1 = 7:

001

010

011

,,,

111

The sum that you want is coef[1]*x1 + coef[2]*x2 + coef[3]*x3, where coef[i] is the i_th element of the binary sequence (thought of as a number). You can use the SUBSTR function is used to extract the i_th coefficient.

data all;
set test;
array x[3] x1-x3;
array comb[7] c1-c7;
label c1="comb3" c2="comb2" c3="comb23"
      c4="comb1" c5="comb13" c6="comb12" c7="comb123";
n = dim(x);
do i = 1 to 2**n - 1;         /* count 1 to 2**n - 1 */
   s = put(i, Binary3.);      /* generate binary representation */
   sum = 0;
   do k = 1 to n;             /* sum = SUM of s[k]*x[k] */
      coef = substr(s,k,1);
      sum + coef*x[k];
   end;
   comb[i] = sum;             /* store sum for this combination of variables */
end;   
output;
keep x1-x3 c1-c7;
run;
proc print label; 
var x1-x3 c4 c2 c1 c6 c5 c3 c7;
run;

Notice that my algorithm does not generate the values in the same order as you did. (You computed sums of variables taken one at a time, taken two at a time, and taken three at a time.) If necessary, I'm sure others on the list can modify my idea to give the results in a different order. To help compare my approach with your, I assigned labels to each variable that agree with your variable names.

The algorithm is explained further in the article "Creating a matrix with all combinations of zeros and ones."

Ying · Posted 06-17-2016 02:24 PM

Thank you, RicK!

jo1 · Posted 05-30-2017 01:22 AM

I know this post was some time ago, but I found in useful in working out how to do a similar thing but with more variables. The thing I'd like to be able to do but I can't think how is to know which of the combinations were 1, 2, 3 etc factor combinations - any suggestions of where to start looking for an answer.

Jo

Rick_SAS · Posted 05-30-2017 06:21 AM

I assume you figured out how to adjust the dimensions of the arrays and the width of the BINARYw. format., The important change is to add S to the KEEP statement. The variable S is a binary string that indicates which variables are in the combination. Thus '10011' represents x1, x2, and x5.

jo1 · Posted 05-30-2017 06:40 PM

Thanks yes, after a bit I worked out to increase the dimensions of the arrays and adjusted the width of the BINARY. I did try adding s to the KEEP statement but I'm assuming it would have been just the last combination that it was showing.
But continuing that line of thought I created a new table with the number of combinations and their BINARYw number (only the 1's)
data new;
do i = 1 to 29;
s = compress(put(i,Binary5.),"0");
output;
I needed to transpose the data anyway so the Combinations (columns) become a variable and then I was able to join the two tables together so instead of having Col1 - Col29 I returned the binary tag created in the new table 1, 11, 111 etc which identified them.
It is a bit of mucking around but I couldn't think of another way to do it. I'm not sure if I should have been able to label the column headings with the new table results somehow.

jo1 · Posted 06-19-2017 11:51 PM

I have been able to adjust this code to use when I have larger values of n but this gets out of hand quickly. Can you give a clue on how to alter this code to only look at combinations of a set number of values. Eg I might have 5 columns but I don't want to look at all possible 31 combinations but only the combinations that have 3 sample points in each combination = 10 combinations.

Thanks

Rick_SAS · Posted 06-20-2017 06:00 AM

The easiest way is to use the ALLCOMB function, as @ChrisHemedinger suggested. See the references in his post. If you want to stay with the binary approach, restrict your attention to strings that have exactly three 1s. If you need further help, I suggest you open a new thread, explain your problem, and post the code that you are using.

SAS Programming

neat way to sum all Combinations

Re: neat way to sum all Combinations

Re: neat way to sum all Combinations

Re: neat way to sum all Combinations

Re: neat way to sum all Combinations

Re: neat way to sum all Combinations

Re: neat way to sum all Combinations

Re: neat way to sum all Combinations

Re: neat way to sum all Combinations

Re: neat way to sum all Combinations

Follow Us

What is...

SAS Programming

Special offer for SAS Communities members

SAS Training: Just a Click Away

Follow Us

What is...