BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Joe_O
Obsidian | Level 7

I have a dataset containing two numeric variables. I want to find every possible combination of pairs of these variables.

 

Note that I am not asking for all possible combinations of the two variables (eg. I could find this by performing a cross join). I want all possible combinations of each pair. For example, if the two variables just had 3 values each (1,2,3), I'd want something like the following:

 

Combination1 = 1&1 2&2 3&3

Combination2 = 1&1 2&3 3&2

Combination3 = 1&2 2&3 3&1

etc

 

I tried using proc transpose to create a matrix but couldn't think of a way to systematically move through the matrix recording each combination. Feels like it's solvable using a 2 dimensional array but I can't quite come up with the solution!

 

Thanks for your help.

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hello @Joe_O and welcome to the SAS Support Communities!

 

Try this:

/* Create sample data for demonstration */

data have;
do x=1 to 4;
  y=x; output;
end;
run;

/* Determine number of observations in HAVE */

data _null_;
call symputx('n',n);
stop;
set have nobs=n;
run;

/* Create the combinations of pairs */

data want(keep=comb x y);
length comb 8;
array _x[&n] _temporary_;
array _y[&n] _temporary_;
set have end=last;
_x[_n_]=x;
_y[_n_]=y;
if last;
do comb=1 to fact(&n);
  _c=allperm(comb, of _y[*]);
  do _i=1 to &n;
    x=_x[_i]; y=_y[_i]; output;
  end;
end;
run;

The result can be reported in a wide structure if needed:

data _null_;
length combpairs $25; /* adapt length appropriately */
do until(last.comb);
  set want;
  by comb;
  combpairs=catx(' ',combpairs,cats('(',x,',',y,')'));
end;
put comb 3. +2 combpairs;
run;

Result:

  1  (1,1) (2,2) (3,3) (4,4)
  2  (1,1) (2,2) (3,4) (4,3)
  3  (1,1) (2,4) (3,2) (4,3)
  4  (1,4) (2,1) (3,2) (4,3)
  5  (1,4) (2,1) (3,3) (4,2)
...
 24  (1,2) (2,1) (3,3) (4,4)

Edit: If x1, ..., xn are the values of the first variable and y1, ..., yn are the values of the second variable, the code creates all n! bijections of the set {1, ..., n} to itself and writes the corresponding pairs (xi, yj) together with the sequence number 1, ..., n! of the combination (=bijection) to the output dataset. Note that duplicates will occur (but could be eliminated) if there are duplicate values in either variable.

View solution in original post

21 REPLIES 21
PeterClemmensen
Tourmaline | Level 20

I'm not quite sure what your desired result looks like.

 

Suppose your data looks like this

 

data have;
input x y;
datalines;
1 1
2 2
3 3
;

What does your desired result look like?

Joe_O
Obsidian | Level 7

Thanks for the reply. Sorry for not being more specific. I'm not particular about the exact structure as long as it is usable but something like the below would be ideal. Where x and y are the input variables as you suggested and Combination is the variable created by the program. 

 

Combinationxy
Combination111
Combination122
Combination133
Combination211
Combination223
Combination232
Combination312
Combination323
Combination331
Combination513
Combination522
Combination531
Combination613
Combination621
Combination632

 

Note that x and y are only used once per "combination" and each "combination" uses all values of x and y. 

Joe_O
Obsidian | Level 7
Just realise I missed "Combination4" but hopefully that makes sense!
PaigeMiller
Diamond | Level 26

@Joe_O wrote:

Thanks for the reply. Sorry for not being more specific. I'm not particular about the exact structure as long as it is usable but something like the below would be ideal. Where x and y are the input variables as you suggested and Combination is the variable created by the program. 

 

Combination x y
Combination1 1 1
Combination1 2 2
Combination1 3 3
Combination2 1 1
Combination2 2 3
Combination2 3 2
Combination3 1 2
Combination3 2 3
Combination3 3 1
Combination5 1 3
Combination5 2 2
Combination5 3 1
Combination6 1 3
Combination6 2 1
Combination6 3 2

 

Note that x and y are only used once per "combination" and each "combination" uses all values of x and y. 


I don't really understand this. Can you show us just the input data (following these instructions so we can have a data set to work from — I will not use data provided in other forms) and then explain how you derive the variable named COMBINATION?

--
Paige Miller
Joe_O
Obsidian | Level 7
My data has hundreds of observations but the datalines statement you posted is fine as a smaller example.

The Combination variable is just a name given to each group of pairs. Once the program has identified a combination of pairs, it just needs to give them a name so that they can be grouped later on.
PaigeMiller
Diamond | Level 26

@Joe_O wrote:
My data has hundreds of observations but the datalines statement you posted is fine as a smaller example.

The Combination variable is just a name given to each group of pairs. Once the program has identified a combination of pairs, it just needs to give them a name so that they can be grouped later on.

I'm afraid this is not an explanation that I can use to provide programming for. How? How do you assign a value to combination? Is it random? Is there some process? If there is a process, we need step-by-step details of that process in order to turn it into a program.

--
Paige Miller
PaigeMiller
Diamond | Level 26

Here's a do-loop in a DATA step solution. I assume in this case the numbers are consecutive, starting with 1 and ending with ENDNUM (which is a variable you can set, in this case I have set it to 4);

 

data want;
    length string $ 8;
    endnum=4;
    do i=1 to endnum;
        do j=(i+1) to endnum;
            string =cats(i,',',j);
            output;
        end;
    end;
    drop endnum;
run;

Naturally, this can be made more flexible if you need to do this for non-consecutive numbers , or if the values don't start with 1.

 

--
Paige Miller
Joe_O
Obsidian | Level 7
Thanks for the reply. Your assumptions are correct however, I am looking for distinct groups of each combination rather than just the combinations themselves. I realise I might not have explained it particularly well but hopefully my other reply makes sense!
FreelanceReinh
Jade | Level 19

Hello @Joe_O and welcome to the SAS Support Communities!

 

Try this:

/* Create sample data for demonstration */

data have;
do x=1 to 4;
  y=x; output;
end;
run;

/* Determine number of observations in HAVE */

data _null_;
call symputx('n',n);
stop;
set have nobs=n;
run;

/* Create the combinations of pairs */

data want(keep=comb x y);
length comb 8;
array _x[&n] _temporary_;
array _y[&n] _temporary_;
set have end=last;
_x[_n_]=x;
_y[_n_]=y;
if last;
do comb=1 to fact(&n);
  _c=allperm(comb, of _y[*]);
  do _i=1 to &n;
    x=_x[_i]; y=_y[_i]; output;
  end;
end;
run;

The result can be reported in a wide structure if needed:

data _null_;
length combpairs $25; /* adapt length appropriately */
do until(last.comb);
  set want;
  by comb;
  combpairs=catx(' ',combpairs,cats('(',x,',',y,')'));
end;
put comb 3. +2 combpairs;
run;

Result:

  1  (1,1) (2,2) (3,3) (4,4)
  2  (1,1) (2,2) (3,4) (4,3)
  3  (1,1) (2,4) (3,2) (4,3)
  4  (1,4) (2,1) (3,2) (4,3)
  5  (1,4) (2,1) (3,3) (4,2)
...
 24  (1,2) (2,1) (3,3) (4,4)

Edit: If x1, ..., xn are the values of the first variable and y1, ..., yn are the values of the second variable, the code creates all n! bijections of the set {1, ..., n} to itself and writes the corresponding pairs (xi, yj) together with the sequence number 1, ..., n! of the combination (=bijection) to the output dataset. Note that duplicates will occur (but could be eliminated) if there are duplicate values in either variable.

Joe_O
Obsidian | Level 7
Thanks so much for this solution- it is precisely what I'm looking for except for one thing - the ALLPERM function has a limit of 18 variables and I will need ~100. I should have specified in my original post. Is there a method that does not use this function?
FreelanceReinh
Jade | Level 19

@Joe_O wrote:
Thanks so much for this solution- it is precisely what I'm looking for except for one thing - the ALLPERM function has a limit of 18 variables and I will need ~100. I should have specified in my original post. Is there a method that does not use this function?

Given that 18!=6,402,373,705,728,000, I can understand why there is that limit. Are there so many duplicate values (available for elimination) among those ~100 that the number of 100!=9.3E157 bijections would be reduced substantially?

Joe_O
Obsidian | Level 7
No duplicates. Both variables will literally just be the number 1 to 100 (or whatever the limit is).

Perhaps I didn't think about how realistic this would be!

I think I will be able to reduce the list of possible pairs down. For example, if my list goes up to 10, I could say

x=1 may only be paired with y= 3,5 or 7".
x=2 may only be paired with y= 1,5,8 or 9".

Obviously, I'd have to alter the input. I expect this will add to the complexity of the solution but would this make it more feasible? How would you proceed?
FreelanceReinh
Jade | Level 19

@Joe_O wrote:
I think I will be able to reduce the list of possible pairs down. For example, if my list goes up to 10, I could say

x=1 may only be paired with y= 3,5 or 7".
x=2 may only be paired with y= 1,5,8 or 9".

This sounds more and more like a combinatorial problem (maybe combinatorial optimization if a minimization or maximization of some objective function comes into play) requiring powerful algorithms such as those available in SAS/OR (which I don't have). See the item "solving optimal assignment problems" in the list of capabilities. With 100, let alone hundreds of x and y values the complexity of the problem will most likely exceed what is feasible with a simple DATA step or procedures in Base SAS or even SAS/STAT. If you have access to SAS/OR, I'd recommend that you start a new thread in the Mathematical Optimization, Discrete-Event Simulation, and OR subforum after a search in the existing posts there (see, for example, Constraint setup for skills/abiities/assignment in bin packing problem optmodel). The SAS employee answering most of the questions in that subforum is an outstanding expert (as far as I can tell). Good luck!

PaigeMiller
Diamond | Level 26

@Joe_O wrote:
Thanks so much for this solution- it is precisely what I'm looking for except for one thing - the ALLPERM function has a limit of 18 variables and I will need ~100. I should have specified in my original post. Is there a method that does not use this function?

Do you mean ~100 observations, or do you mean ~100 variables?

--
Paige Miller

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 21 replies
  • 2764 views
  • 3 likes
  • 6 in conversation