BookmarkSubscribeRSS Feed
JohnX
Fluorite | Level 6

Hi everyone,

 

I would like to have all the possible combinations to group n observations into two groups. For example, if n=3, and using the base 2, I get the set by writing 0 to 3 (as writing 4 to 7 is just the complement)

 

I could not find anything that would already exist, but I might be wrong. If there was something like this right away that would be great.

 

Because this does not exist, I thought the easiest way would be to generate a matrix with n columns and (1-2^(n+1))/(1-2) rows (in fact as mentioned previously only the first half is useful) and populate if with 0s and 1s to get something similar to this 

 

Cn   Cn-1                          C2   C1   C0

0      0                               0      0      0

0      0                               0      0      1

0      0                               0      1      0

0      0                               0      1      1

0      0                               1      0      0

 

I just do not know how to create such a matrix using SAS. That would be very easy if I could use indices, and the remaining of the euclidian division, but SAS do not know such things.

I was also thinking of asking SAS to write the row number-1 in base 2 with 1 column <=> 1 bit, but again, not sure it can do this.

 

This is the only thing I need, as then that would be very easy to join on my table and using 0s and 1s to know to which final cluster my observation belongs.

 

Please, no need to mention PROC IML, I do not have it on the system I use.

 

Thank you very much

2 REPLIES 2
ballardw
Super User

Lets start with some basics: All combinations of what? A single variable? A list of variables? An arbitrary matrix?

A concrete data step to show where you are starting might help.

 

In a data step there are several functions related to doing permutations and combinations .

You may want to read the documentation for functions Call Allcomb, Allcomb, Allcombi, Lexcomb and Lexcombi

FreelanceReinh
Jade | Level 19

Hi @JohnX,

 

Let's call the groups 0 and 1. Then I would assign the first observation always to group 0 and the other n-1 observations to the groups given by the binary digits of the integers 0, 1, ..., 2**(n-1)-1. This covers all possible combinations up to switching the two groups. Here's a DATA step which creates variables g2, g3, ..., gn containing the group numbers (i.e., 0 or 1) for the 2nd, 3rd, ..., n-th observation:

%let n=10;

data want(drop=i j);
array g[2:&n] g2-g&n;
do i=0 to 2**(&n-1)-1;
  do j=2 to &n;
    g[j]=~~band(i,2**(&n-j));
  end;
  output;
end;
run;

Edit: I had tested three variants of this DATA step (some using the BINARYw. format, some using $1-character arrays). The above version was the fastest for n=25: it took about 20 seconds to create the 2**24=16,777,216 observations with 24 variables. You may want to do whatever needs to be done in the outer DO loop rather than actually OUTPUT that "matrix" -- whose information value is obviously very limited.

 

Edit 2: If run time is an issue, here's another version which is more than 10 times faster than the above suggestion: <2 seconds for n=25.

%macro bitcomb(n);
%local i;
data want;
array g[2:&n] 3 g2-g&n;
%do i=2 %to &n;
  do g&i=0, 1;
%end;
    output;
%do i=2 %to &n;
  end;
%end;
run;
%mend bitcomb;

%bitcomb(10)

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 348 views
  • 0 likes
  • 3 in conversation