SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Using proc sql to create all possible pairs of two variables

Reply
New Contributor
Posts: 3

Using proc sql to create all possible pairs of two variables

I am trying to use proc sql to create all possible pairs of two variables.

 

For example, if my starting data appears like this...

 

letter number

a 1

b 2

c 3

 

I would like the output data to look like...

 

pair1 pair2 pair3

a-1 b-2 c-3

a-1 b-3 c-2

a-2 b-1 c-3

a-2 b-3 c-2

a-3 b-1 c-2

a-3 b-2 c-1

 

I have been able to get a single column with all 9 possible pairs, but when I try to put multiple pairs on a single line, things get messed up.

 

Thanks in advance for your help!

 

Respected Advisor
Posts: 4,935

Re: Using proc sql to create all possible pairs of two variables

What you want is not a cartesian product as produced by SQL. In a cartesian product, the pair a-1 would occur only once...

What you require is a list of all permutations.

 

data have;
input letter $ number;
datalines;
a 1
b 2
c 3
;

proc sql noprint;
select count(letter) into :nbLetters from have;
quit;

proc transpose data=have out=temp(drop=_name_);
var number;
id letter;
run;

data want;
set temp;
array l{&nbLetters} _numeric_;
array pair{&nbLetters} $8;

do i = 1 to fact(&nbLetters);
    call allperm(i, of l{*});
    do j = 1 to &nbLetters;
        pair{j} = catx("-", vname(l{j}), l{j});
        end;
    output;
    end;
keep pair:;
run;

proc print data=want noobs; run;

you could replace allperm by lexperm to get only permutations of distinct nonmissing values.

PG
New Contributor
Posts: 3

Re: Using proc sql to create all possible pairs of two variables

Thanks so much. In my actual data, both variables are character strings. The length of the variables range from 3-8 characters. How would I edit the code to work for two character strings?

 

Thanks again!

Trusted Advisor
Posts: 1,118

Re: Using proc sql to create all possible pairs of two variables

[ Edited ]

If I may step in here ... I think the following modification of the two array definitions should work for you:

array l{&nbLetters} _character_;
array pair{&nbLetters} $17; /* Minimum required length depends on how you concatenate your strings */

I really like PG's solution because it's so elegant. Please note, however, that there are rare, exceptional cases where it would not work without further modification. For example, if the first character variable contained values such as "a:a" and "a_a", the PROC TRANSPOSE step would fail. But if your character strings are "well-behaved", no problems should occur.

New Contributor
Posts: 3

Re: Using proc sql to create all possible pairs of two variables

Posted in reply to FreelanceReinhard

Thanks! That worked. I wasn't allowing for a long enough string for my concatenation.

Trusted Advisor
Posts: 1,118

Re: Using proc sql to create all possible pairs of two variables

One could code the transposing into the data step and thus avoid PROC TRANSPOSE:

data have;
length string1
       string2 $8;
input string1 string2;
cards;
first one
second two
third three
fourth four
;

proc sql noprint;
select count(*) into :n from have;
quit;

data want;
array s[&n] $8;
array t[&n] $8;
array p[&n] $17;
do i=1 by 1 until(e);
  set have end=e;
  s[i]=string1;
  t[i]=string2;
end;
do i=1 to fact(&n);
  call allperm(i, of t[*]);
  do j=1 to &n;
    p[j]=catx('-', s[j], t[j]);
  end;
  output;
end;
keep p:;
run;

proc print;
run;
Occasional Learner
Posts: 1

Re: Using proc sql to create all possible pairs of two variables

Another solution:

data have;
input letter $ number;
datalines;
a 1
b 2
c 3
;


PROC SQL;
   CREATE TABLE WORK.temp1 AS 
   SELECT t1.letter
      FROM WORK.HAVE t1;
QUIT;

PROC SQL;
   CREATE TABLE WORK.temp2 AS 
   SELECT t1.number
      FROM WORK.HAVE t1;
QUIT;

PROC SQL;
   CREATE TABLE WORK.temp3 AS 
   SELECT t1.letter, 
          t2.number
      FROM WORK.temp1 t1
           CROSS JOIN WORK.temp2 t2;
QUIT;

PROC SQL;
   CREATE TABLE WORK.WANT0 AS 
   SELECT DISTINCT t1.letter, 
          t1.number, 
          t2.letter AS letter1, 
          t2.number AS number1, 
          t3.letter AS letter2, 
          t3.number AS number2
      FROM WORK.temp3 t1
           CROSS JOIN WORK.temp3 t2
           CROSS JOIN WORK.temp3 t3
      WHERE t1.letter < t2.letter AND t2.letter < t3.letter AND t1.number NOT = t2.number AND t2.number NOT = t3.number 
           AND t1.number NOT = t3.number;
QUIT;
PROC SQL;
   CREATE TABLE WORK.WANT AS 
   SELECT /* VAR1 */
            (CATX('-',t1.letter,put(t1.number,z.))) AS VAR1, 
          /* VAR2 */
            (CATX('-',t1.letter1, put( t1.number1,z.))) AS VAR2, 
          /* VAR3 */
            (CATX('-',t1.letter2,PUT(t1.number2,Z.))) AS VAR3
      FROM WORK.WANT0 t1;
QUIT;
Ask a Question
Discussion stats
  • 6 replies
  • 733 views
  • 5 likes
  • 4 in conversation