Solved: Re: Delete all the duplicates

Zerg · Posted 04-18-2019 04:12 PM

Hello,

I want to delete all the duplicated observations. Example data is:

obs.	id	year	var1
1	1	1999	5
2	2	2000	10
3	2	2000	8
4	2	2000	6
5	3	2001	7
6	4	2002	12
7	4	2002	15
8	5	2001	9
9	6	2001	4
10	7	2002	3

A unique observation is determined by id and year, so in the example data, the duplicates are obs. 2 3 4 and obs. 6 7.

I want to delete all the duplicates without keeping one unique observation. So the data I want is:

obs.	id	year	var1
1	1	1999	5
5	3	2001	7
8	5	2001	9
9	6	2001	4
10	7	2002	3

proc sort by id year nodupkey option will keep obs. 2 and obs. 6 and nouniquekey option can output a dataset that contains obs.2 3 4 6 7. Is there an efficient way to get the final data I want? Thank you.

novinosrin · Posted 04-18-2019 04:16 PM

data have;
input obs	id	year	var1;
cards;
1	1	1999	5
2	2	2000	10
3	2	2000	8
4	2	2000	6
5	3	2001	7
6	4	2002	12
7	4	2002	15
8	5	2001	9
9	6	2001	4
10	7	2002	3
;

proc sql;
create table want as
select *
from have
group by id, year
having count(*)=1;
quit;

View solution in original post

novinosrin · Posted 04-18-2019 04:16 PM

data have;
input obs	id	year	var1;
cards;
1	1	1999	5
2	2	2000	10
3	2	2000	8
4	2	2000	6
5	3	2001	7
6	4	2002	12
7	4	2002	15
8	5	2001	9
9	6	2001	4
10	7	2002	3
;

proc sql;
create table want as
select *
from have
group by id, year
having count(*)=1;
quit;

Zerg · Posted 04-18-2019 04:19 PM

Works perfectly. Thank you.

ballardw · Posted 04-18-2019 05:41 PM

Another approach:

proc sort data=have 
          out=want 
          nouniquekey uniqueout=deleted;
  by id year;
run;

the Nouniquekey is the operative part. This also demonstrates the Uniqueout option to write those deleted into a separate data set if needed.

Delete all the duplicates

Re: Delete all the duplicates

Re: Delete all the duplicates

Re: Delete all the duplicates

Re: Delete all the duplicates

SAS Innovate 2025: Register Now

SAS Training: Just a Click Away