BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
natasha_biostat
Calcite | Level 5

Hello,

I am trying to create pairwise data for comparison purposes using proc sql. However, I m stuck at duplicate observations.

What I have:

IDoprrepvari
1jbr1a
2jbr1b
3jbr2c
4jbr2d
5jbr3e
6jbr3f
7jbr4g
8jbr4h
9slr5i
10slr5j
11slr6k
12slr6l
13slr7
14slr7n
15slr8
16slr8p



What I want: 

opr1opr2rep1rep2var1var2
jbslr1r5ai
jbslr1r5bj
jbslr1r6ak
jbslr1r6bl
jbslr1r7am
jbslr1r7bn
jbslr1r8ao
jbslr1r8bp
jbslr2r5ci
jbslr2r5dj
jbslr2r6ck
jbslr2r6dl
jbslr2r7cm
jbslr2r7dn
jbslr2r8co
jbslr2r8dp
jbslr3r5ei
jbslr3r5fj
jbslr3r6ek
jbslr3r6fl
jbslr3r7em
jbslr3r7fn
jbslr3r8eo
jbslr3r8fp
jbslr4r5gi
jbslr4r5hj
jbslr4r6gk
jbslr4r6hl
jbslr4r7gm
jbslr4r7hn
jbslr4r8go
jbslr4r8hp

The code I use:

data var_rep;
input ID opr $ rep $ vari $;
datalines;
1 jb r1 a
2 jb r1 b
3 jb r2 c
4 jb r2 d
5 jb r3 e
6 jb r3 f
7 jb r4 g
8 jb r4 h
9 sl r5 i
10 sl r5 j
11 sl r6 k
12 sl r6 l
13 sl r7 m
14 sl r7 n
15 sl r8 o
16 sl r8 p
;
run;
proc sql; /*create all possible pairings with respect to operator and unique id*/
create table comb as
select a.opr as opr1, b.opr as opr2, a.rep as rep1, b.rep as rep2, a.vari as var1, b.vari as var2
from var_rep as a , var_rep as b
where a.opr<b.opr and a.rep<b.rep
order a.rep,b.rep, a.opr,b.opr;
quit;

I am getting duplicate observations. The total observations should be 32 and I m getting 64. 

Thanks for all the help.

 

1 ACCEPTED SOLUTION

Accepted Solutions
s_lassen
Meteorite | Level 14

The problem is not really duplicates, but that you get all combinations of VAR1 and VAR2.

 

This is a solution that works with your test data:

data temp;
  do idx=1 by 1 until(last.rep);
    set var_rep;
    by opr rep;
    output;
    end;
run;

proc sql; /*create all possible pairings with respect to operator and unique id*/
create table comb as
select distinct a.opr as opr1, b.opr as opr2, a.rep as rep1, b.rep as rep2, a.vari as var1, b.vari as var2
from temp as a , temp as b
where a.opr<b.opr and a.rep<b.rep and a.idx=b.idx
order by a.rep,b.rep, a.opr,b.opr;
quit;

The IDX variable is a record number within the group, and it is used to limit the number of combinations. Hopefully, it may also work with your real data.

View solution in original post

4 REPLIES 4
ballardw
Super User

Define some rules for what you intend by "pairwise".

Which pairs?

There are no "duplicates" in the output from that Proc SQL code. Unwanted perhaps, but not duplicates.If there were duplicates the adding DISTINCT to the Select would remove them.

 

Indicate some that you considered duplicates as part of the rules for pairwise.

natasha_biostat
Calcite | Level 5

With my code, I am getting the following combination:

OPR1OPR2REP1REP2VAR1VAR2
jbslr1r5aj
jbslr1r5ai
jbslr1r5bj
jbslr1r5bi
jbslr1r6bk
jbslr1r6bl
jbslr1r6ak
jbslr1r6al
jbslr1r7an
jbslr1r7bm
jbslr1r7am
jbslr1r7bn
jbslr1r8ao
jbslr1r8bo
jbslr1r8bp
jbslr1r8ap
jbslr2r5dj
jbslr2r5di
jbslr2r5cj
jbslr2r5ci
jbslr2r6dl
jbslr2r6dk
jbslr2r6cl
jbslr2r6ck
jbslr2r7dm
jbslr2r7dn
jbslr2r7cm
jbslr2r7cn
jbslr2r8do
jbslr2r8cp
jbslr2r8co
jbslr2r8dp
jbslr3r5fj
jbslr3r5fi
jbslr3r5ej
jbslr3r5ei
jbslr3r6fl
jbslr3r6fk
jbslr3r6el
jbslr3r6ek
jbslr3r7fn
jbslr3r7fm
jbslr3r7en
jbslr3r7em
jbslr3r8fp
jbslr3r8fo
jbslr3r8ep
jbslr3r8eo
jbslr4r5hj
jbslr4r5hi
jbslr4r5gj
jbslr4r5gi
jbslr4r6hl
jbslr4r6hk
jbslr4r6gl
jbslr4r6gk
jbslr4r7hn
jbslr4r7hm
jbslr4r7gn
jbslr4r7gm
jbslr4r8ho
jbslr4r8gp
jbslr4r8go
jbslr4r8hp

Thanks for recommending 'distinct' idea. I will try that. 

s_lassen
Meteorite | Level 14

The problem is not really duplicates, but that you get all combinations of VAR1 and VAR2.

 

This is a solution that works with your test data:

data temp;
  do idx=1 by 1 until(last.rep);
    set var_rep;
    by opr rep;
    output;
    end;
run;

proc sql; /*create all possible pairings with respect to operator and unique id*/
create table comb as
select distinct a.opr as opr1, b.opr as opr2, a.rep as rep1, b.rep as rep2, a.vari as var1, b.vari as var2
from temp as a , temp as b
where a.opr<b.opr and a.rep<b.rep and a.idx=b.idx
order by a.rep,b.rep, a.opr,b.opr;
quit;

The IDX variable is a record number within the group, and it is used to limit the number of combinations. Hopefully, it may also work with your real data.

natasha_biostat
Calcite | Level 5

Thanks, I will try it out. 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 708 views
  • 0 likes
  • 3 in conversation