Hello,
I am trying to create pairwise data for comparison purposes using proc sql. However, I m stuck at duplicate observations.
What I have:
ID | opr | rep | vari |
1 | jb | r1 | a |
2 | jb | r1 | b |
3 | jb | r2 | c |
4 | jb | r2 | d |
5 | jb | r3 | e |
6 | jb | r3 | f |
7 | jb | r4 | g |
8 | jb | r4 | h |
9 | sl | r5 | i |
10 | sl | r5 | j |
11 | sl | r6 | k |
12 | sl | r6 | l |
13 | sl | r7 | m |
14 | sl | r7 | n |
15 | sl | r8 | o |
16 | sl | r8 | p |
What I want:
opr1 | opr2 | rep1 | rep2 | var1 | var2 |
jb | sl | r1 | r5 | a | i |
jb | sl | r1 | r5 | b | j |
jb | sl | r1 | r6 | a | k |
jb | sl | r1 | r6 | b | l |
jb | sl | r1 | r7 | a | m |
jb | sl | r1 | r7 | b | n |
jb | sl | r1 | r8 | a | o |
jb | sl | r1 | r8 | b | p |
jb | sl | r2 | r5 | c | i |
jb | sl | r2 | r5 | d | j |
jb | sl | r2 | r6 | c | k |
jb | sl | r2 | r6 | d | l |
jb | sl | r2 | r7 | c | m |
jb | sl | r2 | r7 | d | n |
jb | sl | r2 | r8 | c | o |
jb | sl | r2 | r8 | d | p |
jb | sl | r3 | r5 | e | i |
jb | sl | r3 | r5 | f | j |
jb | sl | r3 | r6 | e | k |
jb | sl | r3 | r6 | f | l |
jb | sl | r3 | r7 | e | m |
jb | sl | r3 | r7 | f | n |
jb | sl | r3 | r8 | e | o |
jb | sl | r3 | r8 | f | p |
jb | sl | r4 | r5 | g | i |
jb | sl | r4 | r5 | h | j |
jb | sl | r4 | r6 | g | k |
jb | sl | r4 | r6 | h | l |
jb | sl | r4 | r7 | g | m |
jb | sl | r4 | r7 | h | n |
jb | sl | r4 | r8 | g | o |
jb | sl | r4 | r8 | h | p |
The code I use:
data var_rep;
input ID opr $ rep $ vari $;
datalines;
1 jb r1 a
2 jb r1 b
3 jb r2 c
4 jb r2 d
5 jb r3 e
6 jb r3 f
7 jb r4 g
8 jb r4 h
9 sl r5 i
10 sl r5 j
11 sl r6 k
12 sl r6 l
13 sl r7 m
14 sl r7 n
15 sl r8 o
16 sl r8 p
;
run;
proc sql; /*create all possible pairings with respect to operator and unique id*/
create table comb as
select a.opr as opr1, b.opr as opr2, a.rep as rep1, b.rep as rep2, a.vari as var1, b.vari as var2
from var_rep as a , var_rep as b
where a.opr<b.opr and a.rep<b.rep
order a.rep,b.rep, a.opr,b.opr;
quit;
I am getting duplicate observations. The total observations should be 32 and I m getting 64.
Thanks for all the help.
The problem is not really duplicates, but that you get all combinations of VAR1 and VAR2.
This is a solution that works with your test data:
data temp;
do idx=1 by 1 until(last.rep);
set var_rep;
by opr rep;
output;
end;
run;
proc sql; /*create all possible pairings with respect to operator and unique id*/
create table comb as
select distinct a.opr as opr1, b.opr as opr2, a.rep as rep1, b.rep as rep2, a.vari as var1, b.vari as var2
from temp as a , temp as b
where a.opr<b.opr and a.rep<b.rep and a.idx=b.idx
order by a.rep,b.rep, a.opr,b.opr;
quit;
The IDX variable is a record number within the group, and it is used to limit the number of combinations. Hopefully, it may also work with your real data.
Define some rules for what you intend by "pairwise".
Which pairs?
There are no "duplicates" in the output from that Proc SQL code. Unwanted perhaps, but not duplicates.If there were duplicates the adding DISTINCT to the Select would remove them.
Indicate some that you considered duplicates as part of the rules for pairwise.
With my code, I am getting the following combination:
OPR1 | OPR2 | REP1 | REP2 | VAR1 | VAR2 |
jb | sl | r1 | r5 | a | j |
jb | sl | r1 | r5 | a | i |
jb | sl | r1 | r5 | b | j |
jb | sl | r1 | r5 | b | i |
jb | sl | r1 | r6 | b | k |
jb | sl | r1 | r6 | b | l |
jb | sl | r1 | r6 | a | k |
jb | sl | r1 | r6 | a | l |
jb | sl | r1 | r7 | a | n |
jb | sl | r1 | r7 | b | m |
jb | sl | r1 | r7 | a | m |
jb | sl | r1 | r7 | b | n |
jb | sl | r1 | r8 | a | o |
jb | sl | r1 | r8 | b | o |
jb | sl | r1 | r8 | b | p |
jb | sl | r1 | r8 | a | p |
jb | sl | r2 | r5 | d | j |
jb | sl | r2 | r5 | d | i |
jb | sl | r2 | r5 | c | j |
jb | sl | r2 | r5 | c | i |
jb | sl | r2 | r6 | d | l |
jb | sl | r2 | r6 | d | k |
jb | sl | r2 | r6 | c | l |
jb | sl | r2 | r6 | c | k |
jb | sl | r2 | r7 | d | m |
jb | sl | r2 | r7 | d | n |
jb | sl | r2 | r7 | c | m |
jb | sl | r2 | r7 | c | n |
jb | sl | r2 | r8 | d | o |
jb | sl | r2 | r8 | c | p |
jb | sl | r2 | r8 | c | o |
jb | sl | r2 | r8 | d | p |
jb | sl | r3 | r5 | f | j |
jb | sl | r3 | r5 | f | i |
jb | sl | r3 | r5 | e | j |
jb | sl | r3 | r5 | e | i |
jb | sl | r3 | r6 | f | l |
jb | sl | r3 | r6 | f | k |
jb | sl | r3 | r6 | e | l |
jb | sl | r3 | r6 | e | k |
jb | sl | r3 | r7 | f | n |
jb | sl | r3 | r7 | f | m |
jb | sl | r3 | r7 | e | n |
jb | sl | r3 | r7 | e | m |
jb | sl | r3 | r8 | f | p |
jb | sl | r3 | r8 | f | o |
jb | sl | r3 | r8 | e | p |
jb | sl | r3 | r8 | e | o |
jb | sl | r4 | r5 | h | j |
jb | sl | r4 | r5 | h | i |
jb | sl | r4 | r5 | g | j |
jb | sl | r4 | r5 | g | i |
jb | sl | r4 | r6 | h | l |
jb | sl | r4 | r6 | h | k |
jb | sl | r4 | r6 | g | l |
jb | sl | r4 | r6 | g | k |
jb | sl | r4 | r7 | h | n |
jb | sl | r4 | r7 | h | m |
jb | sl | r4 | r7 | g | n |
jb | sl | r4 | r7 | g | m |
jb | sl | r4 | r8 | h | o |
jb | sl | r4 | r8 | g | p |
jb | sl | r4 | r8 | g | o |
jb | sl | r4 | r8 | h | p |
Thanks for recommending 'distinct' idea. I will try that.
The problem is not really duplicates, but that you get all combinations of VAR1 and VAR2.
This is a solution that works with your test data:
data temp;
do idx=1 by 1 until(last.rep);
set var_rep;
by opr rep;
output;
end;
run;
proc sql; /*create all possible pairings with respect to operator and unique id*/
create table comb as
select distinct a.opr as opr1, b.opr as opr2, a.rep as rep1, b.rep as rep2, a.vari as var1, b.vari as var2
from temp as a , temp as b
where a.opr<b.opr and a.rep<b.rep and a.idx=b.idx
order by a.rep,b.rep, a.opr,b.opr;
quit;
The IDX variable is a record number within the group, and it is used to limit the number of combinations. Hopefully, it may also work with your real data.
Thanks, I will try it out.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.