Solved: Find all possible groups of pairs of two variables

Joe_O · Posted 01-13-2021 07:36 AM

I have a dataset containing two numeric variables. I want to find every possible combination of pairs of these variables.

Note that I am not asking for all possible combinations of the two variables (eg. I could find this by performing a cross join). I want all possible combinations of each pair. For example, if the two variables just had 3 values each (1,2,3), I'd want something like the following:

Combination1 = 1&1 2&2 3&3

Combination2 = 1&1 2&3 3&2

Combination3 = 1&2 2&3 3&1

etc

I tried using proc transpose to create a matrix but couldn't think of a way to systematically move through the matrix recording each combination. Feels like it's solvable using a 2 dimensional array but I can't quite come up with the solution!

Thanks for your help.

FreelanceReinh · Posted 01-13-2021 09:03 AM

Hello @Joe_O and welcome to the SAS Support Communities!

Try this:

/* Create sample data for demonstration */

data have;
do x=1 to 4;
  y=x; output;
end;
run;

/* Determine number of observations in HAVE */

data _null_;
call symputx('n',n);
stop;
set have nobs=n;
run;

/* Create the combinations of pairs */

data want(keep=comb x y);
length comb 8;
array _x[&n] _temporary_;
array _y[&n] _temporary_;
set have end=last;
_x[_n_]=x;
_y[_n_]=y;
if last;
do comb=1 to fact(&n);
  _c=allperm(comb, of _y[*]);
  do _i=1 to &n;
    x=_x[_i]; y=_y[_i]; output;
  end;
end;
run;

The result can be reported in a wide structure if needed:

data _null_;
length combpairs $25; /* adapt length appropriately */
do until(last.comb);
  set want;
  by comb;
  combpairs=catx(' ',combpairs,cats('(',x,',',y,')'));
end;
put comb 3. +2 combpairs;
run;

Result:

  1  (1,1) (2,2) (3,3) (4,4)
  2  (1,1) (2,2) (3,4) (4,3)
  3  (1,1) (2,4) (3,2) (4,3)
  4  (1,4) (2,1) (3,2) (4,3)
  5  (1,4) (2,1) (3,3) (4,2)
...
 24  (1,2) (2,1) (3,3) (4,4)

Edit: If x₁, ..., x_n are the values of the first variable and y₁, ..., y_n are the values of the second variable, the code creates all n! bijections of the set {1, ..., n} to itself and writes the corresponding pairs (x_i, y_j) together with the sequence number 1, ..., n! of the combination (=bijection) to the output dataset. Note that duplicates will occur (but could be eliminated) if there are duplicate values in either variable.

View solution in original post

PeterClemmensen · Posted 01-13-2021 08:04 AM

I'm not quite sure what your desired result looks like.

Suppose your data looks like this

data have;
input x y;
datalines;
1 1
2 2
3 3
;

What does your desired result look like?

The DATA to DATA Step Macro
Blog: SASnrd

Joe_O · Posted 01-13-2021 08:40 AM

Thanks for the reply. Sorry for not being more specific. I'm not particular about the exact structure as long as it is usable but something like the below would be ideal. Where x and y are the input variables as you suggested and Combination is the variable created by the program.

Combination	x	y
Combination1	1	1
Combination1	2	2
Combination1	3	3
Combination2	1	1
Combination2	2	3
Combination2	3	2
Combination3	1	2
Combination3	2	3
Combination3	3	1
Combination5	1	3
Combination5	2	2
Combination5	3	1
Combination6	1	3
Combination6	2	1
Combination6	3	2

Note that x and y are only used once per "combination" and each "combination" uses all values of x and y.

Joe_O · Posted 01-13-2021 08:44 AM

Just realise I missed "Combination4" but hopefully that makes sense!

PaigeMiller · Posted 01-13-2021 09:01 AM

@Joe_O wrote:

Thanks for the reply. Sorry for not being more specific. I'm not particular about the exact structure as long as it is usable but something like the below would be ideal. Where x and y are the input variables as you suggested and Combination is the variable created by the program.

Combination x y

Combination1 1 1

Combination1 2 2

Combination1 3 3

Combination2 1 1

Combination2 2 3

Combination2 3 2

Combination3 1 2

Combination3 2 3

Combination3 3 1

Combination5 1 3

Combination5 2 2

Combination5 3 1

Combination6 1 3

Combination6 2 1

Combination6 3 2

Note that x and y are only used once per "combination" and each "combination" uses all values of x and y.

I don't really understand this. Can you show us just the input data (following these instructions so we can have a data set to work from — I will not use data provided in other forms) and then explain how you derive the variable named COMBINATION?

--
Paige Miller

Joe_O · Posted 01-13-2021 09:33 AM

My data has hundreds of observations but the datalines statement you posted is fine as a smaller example.

The Combination variable is just a name given to each group of pairs. Once the program has identified a combination of pairs, it just needs to give them a name so that they can be grouped later on.

PaigeMiller · Posted 01-13-2021 09:41 AM

@Joe_O wrote:
My data has hundreds of observations but the datalines statement you posted is fine as a smaller example.

The Combination variable is just a name given to each group of pairs. Once the program has identified a combination of pairs, it just needs to give them a name so that they can be grouped later on.

I'm afraid this is not an explanation that I can use to provide programming for. How? How do you assign a value to combination? Is it random? Is there some process? If there is a process, we need step-by-step details of that process in order to turn it into a program.

--
Paige Miller

PaigeMiller · Posted 01-13-2021 08:13 AM

Here's a do-loop in a DATA step solution. I assume in this case the numbers are consecutive, starting with 1 and ending with ENDNUM (which is a variable you can set, in this case I have set it to 4);

data want;
    length string $ 8;
    endnum=4;
    do i=1 to endnum;
        do j=(i+1) to endnum;
            string =cats(i,',',j);
            output;
        end;
    end;
    drop endnum;
run;

Naturally, this can be made more flexible if you need to do this for non-consecutive numbers , or if the values don't start with 1.

--
Paige Miller

Joe_O · Posted 01-13-2021 08:42 AM

Thanks for the reply. Your assumptions are correct however, I am looking for distinct groups of each combination rather than just the combinations themselves. I realise I might not have explained it particularly well but hopefully my other reply makes sense!

FreelanceReinh · Posted 01-13-2021 09:03 AM

Hello @Joe_O and welcome to the SAS Support Communities!

Try this:

/* Create sample data for demonstration */

data have;
do x=1 to 4;
  y=x; output;
end;
run;

/* Determine number of observations in HAVE */

data _null_;
call symputx('n',n);
stop;
set have nobs=n;
run;

/* Create the combinations of pairs */

data want(keep=comb x y);
length comb 8;
array _x[&n] _temporary_;
array _y[&n] _temporary_;
set have end=last;
_x[_n_]=x;
_y[_n_]=y;
if last;
do comb=1 to fact(&n);
  _c=allperm(comb, of _y[*]);
  do _i=1 to &n;
    x=_x[_i]; y=_y[_i]; output;
  end;
end;
run;

The result can be reported in a wide structure if needed:

data _null_;
length combpairs $25; /* adapt length appropriately */
do until(last.comb);
  set want;
  by comb;
  combpairs=catx(' ',combpairs,cats('(',x,',',y,')'));
end;
put comb 3. +2 combpairs;
run;

Result:

  1  (1,1) (2,2) (3,3) (4,4)
  2  (1,1) (2,2) (3,4) (4,3)
  3  (1,1) (2,4) (3,2) (4,3)
  4  (1,4) (2,1) (3,2) (4,3)
  5  (1,4) (2,1) (3,3) (4,2)
...
 24  (1,2) (2,1) (3,3) (4,4)

Edit: If x₁, ..., x_n are the values of the first variable and y₁, ..., y_n are the values of the second variable, the code creates all n! bijections of the set {1, ..., n} to itself and writes the corresponding pairs (x_i, y_j) together with the sequence number 1, ..., n! of the combination (=bijection) to the output dataset. Note that duplicates will occur (but could be eliminated) if there are duplicate values in either variable.

Joe_O · Posted 01-13-2021 09:29 AM

Thanks so much for this solution- it is precisely what I'm looking for except for one thing - the ALLPERM function has a limit of 18 variables and I will need ~100. I should have specified in my original post. Is there a method that does not use this function?

FreelanceReinh · Posted 01-13-2021 09:38 AM

@Joe_O wrote:
Thanks so much for this solution- it is precisely what I'm looking for except for one thing - the ALLPERM function has a limit of 18 variables and I will need ~100. I should have specified in my original post. Is there a method that does not use this function?

Given that 18!=6,402,373,705,728,000, I can understand why there is that limit. Are there so many duplicate values (available for elimination) among those ~100 that the number of 100!=9.3E157 bijections would be reduced substantially?

Joe_O · Posted 01-13-2021 10:01 AM

No duplicates. Both variables will literally just be the number 1 to 100 (or whatever the limit is).

Perhaps I didn't think about how realistic this would be!

I think I will be able to reduce the list of possible pairs down. For example, if my list goes up to 10, I could say

x=1 may only be paired with y= 3,5 or 7".
x=2 may only be paired with y= 1,5,8 or 9".

Obviously, I'd have to alter the input. I expect this will add to the complexity of the solution but would this make it more feasible? How would you proceed?

FreelanceReinh · Posted 01-13-2021 12:21 PM

@Joe_O wrote:
I think I will be able to reduce the list of possible pairs down. For example, if my list goes up to 10, I could say

x=1 may only be paired with y= 3,5 or 7".
x=2 may only be paired with y= 1,5,8 or 9".

This sounds more and more like a combinatorial problem (maybe combinatorial optimization if a minimization or maximization of some objective function comes into play) requiring powerful algorithms such as those available in SAS/OR (which I don't have). See the item "solving optimal assignment problems" in the list of capabilities. With 100, let alone hundreds of x and y values the complexity of the problem will most likely exceed what is feasible with a simple DATA step or procedures in Base SAS or even SAS/STAT. If you have access to SAS/OR, I'd recommend that you start a new thread in the Mathematical Optimization, Discrete-Event Simulation, and OR subforum after a search in the existing posts there (see, for example, Constraint setup for skills/abiities/assignment in bin packing problem optmodel). The SAS employee answering most of the questions in that subforum is an outstanding expert (as far as I can tell). Good luck!

PaigeMiller · Posted 01-13-2021 09:39 AM

@Joe_O wrote:
Thanks so much for this solution- it is precisely what I'm looking for except for one thing - the ALLPERM function has a limit of 18 variables and I will need ~100. I should have specified in my original post. Is there a method that does not use this function?

Do you mean ~100 observations, or do you mean ~100 variables?

--
Paige Miller

Find all possible groups of pairs of two variables

Re: Find all possible groups of pairs of two variables

Re: Find all possible groups of pairs of two variables

Re: Find all possible groups of pairs of two variables

Re: Find all possible groups of pairs of two variables

Re: Find all possible groups of pairs of two variables

Re: Find all possible groups of pairs of two variables

Re: Find all possible groups of pairs of two variables

Re: Find all possible groups of pairs of two variables

Re: Find all possible groups of pairs of two variables

Re: Find all possible groups of pairs of two variables

Re: Find all possible groups of pairs of two variables

Re: Find all possible groups of pairs of two variables

Re: Find all possible groups of pairs of two variables

Re: Find all possible groups of pairs of two variables

Re: Find all possible groups of pairs of two variables

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away