proc sort and dupout: how to get the pairs of duplicates

Solved
Super Contributor
Posts: 328

proc sort and dupout: how to get the pairs of duplicates

``proc sort data=A out=B dupout=C nodupkey; By var1 var 2 var3 ; run;``

by using the above code, I can get a dataset unique on By variables (dataset B) and the duplicates on the By variables (dataset C).

Sometimes I want to compare the duplicates (unique ones in B and duplicates in C), to see what variables they differ other than the By variables, but how to put them together? I mean how to extract those obs which have the matching By variables  in dataset C?

for example, I have dataset A as:

ID age sex win lost

1 20 F 200 120

2 22 M 150 130

2 22 M 150 80

3 25 M 110 90

3 25 M 110 210

4 27 F  105 85

if I run

``proc sort data=A out=B dupout=C nodupkey; by ID age sex win; run;``

I will get B as:

ID age sex win lost

1 20 F 200 120

2 22 M 150 130

3 25 M 110 90

4 27 F  105 85

and C:

ID age sex win lost

2 22 M 150 80

3 25 M 110 210

Now I want to how other variables in the duplicates differ other than the identical By variables, so I want to have the PAIRS of duplicates like this:

ID age sex win lost

2 22 M 150 130

2 22 M 150 80

3 25 M 110 90

3 25 M 110 210

This means I need to extract the "By variable" identical obs from dataset B. How to do it? Thanks in advance.

Accepted Solutions
Solution
‎03-06-2018 11:01 AM
Posts: 1,312

Re: proc sort and dupout: how to get the pairs of duplicates

[ Edited ]

``````data bygroups_having_duplicates;
set b (in=inb) c;
by id age;
if not(first.age=1 and last.age=1);
if inb then source='B';
else source='C';
run;``````

No singletons will pass the subsetting if statement.  And the first record for each by group will be from dataset B.  All subsequent records for the by group are from C.

PROC SQL alternative from @KurtBremser:

``````proc sql;
create table d as
select * from a
group by id, age, sex, win
having count(*) >= 2
;
quit;``````

All Replies
Regular Contributor
Posts: 249

Re: proc sort and dupout: how to get the pairs of duplicates

``````proc sort data=A out=dup_rec nouniquekey; by id age sex win; run;
proc print data=dup_rec noobs; run;``````

Solution
‎03-06-2018 11:01 AM
Posts: 1,312

Re: proc sort and dupout: how to get the pairs of duplicates

[ Edited ]

``````data bygroups_having_duplicates;
set b (in=inb) c;
by id age;
if not(first.age=1 and last.age=1);
if inb then source='B';
else source='C';
run;``````

No singletons will pass the subsetting if statement.  And the first record for each by group will be from dataset B.  All subsequent records for the by group are from C.

PROC SQL alternative from @KurtBremser:

``````proc sql;
create table d as
select * from a
group by id, age, sex, win
having count(*) >= 2
;
quit;``````
Super User
Posts: 9,928

Re: proc sort and dupout: how to get the pairs of duplicates

``````proc sql;
create table d as
select * from a
group by id, age, sex, win
having count(*) >= 2
;
quit;``````
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Super Contributor
Posts: 328