Hello,
I want to delete all the duplicated observations. Example data is:
obs. | id | year | var1 |
1 | 1 | 1999 | 5 |
2 | 2 | 2000 | 10 |
3 | 2 | 2000 | 8 |
4 | 2 | 2000 | 6 |
5 | 3 | 2001 | 7 |
6 | 4 | 2002 | 12 |
7 | 4 | 2002 | 15 |
8 | 5 | 2001 | 9 |
9 | 6 | 2001 | 4 |
10 | 7 | 2002 | 3 |
A unique observation is determined by id and year, so in the example data, the duplicates are obs. 2 3 4 and obs. 6 7.
I want to delete all the duplicates without keeping one unique observation. So the data I want is:
obs. | id | year | var1 |
1 | 1 | 1999 | 5 |
5 | 3 | 2001 | 7 |
8 | 5 | 2001 | 9 |
9 | 6 | 2001 | 4 |
10 | 7 | 2002 | 3 |
proc sort by id year nodupkey option will keep obs. 2 and obs. 6 and nouniquekey option can output a dataset that contains obs.2 3 4 6 7. Is there an efficient way to get the final data I want? Thank you.
data have;
input obs id year var1;
cards;
1 1 1999 5
2 2 2000 10
3 2 2000 8
4 2 2000 6
5 3 2001 7
6 4 2002 12
7 4 2002 15
8 5 2001 9
9 6 2001 4
10 7 2002 3
;
proc sql;
create table want as
select *
from have
group by id, year
having count(*)=1;
quit;
data have;
input obs id year var1;
cards;
1 1 1999 5
2 2 2000 10
3 2 2000 8
4 2 2000 6
5 3 2001 7
6 4 2002 12
7 4 2002 15
8 5 2001 9
9 6 2001 4
10 7 2002 3
;
proc sql;
create table want as
select *
from have
group by id, year
having count(*)=1;
quit;
Another approach:
proc sort data=have out=want nouniquekey uniqueout=deleted; by id year; run;
the Nouniquekey is the operative part. This also demonstrates the Uniqueout option to write those deleted into a separate data set if needed.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.