Hello,
I want to delete all the duplicated observations. Example data is:
| obs. | id | year | var1 |
| 1 | 1 | 1999 | 5 |
| 2 | 2 | 2000 | 10 |
| 3 | 2 | 2000 | 8 |
| 4 | 2 | 2000 | 6 |
| 5 | 3 | 2001 | 7 |
| 6 | 4 | 2002 | 12 |
| 7 | 4 | 2002 | 15 |
| 8 | 5 | 2001 | 9 |
| 9 | 6 | 2001 | 4 |
| 10 | 7 | 2002 | 3 |
A unique observation is determined by id and year, so in the example data, the duplicates are obs. 2 3 4 and obs. 6 7.
I want to delete all the duplicates without keeping one unique observation. So the data I want is:
| obs. | id | year | var1 |
| 1 | 1 | 1999 | 5 |
| 5 | 3 | 2001 | 7 |
| 8 | 5 | 2001 | 9 |
| 9 | 6 | 2001 | 4 |
| 10 | 7 | 2002 | 3 |
proc sort by id year nodupkey option will keep obs. 2 and obs. 6 and nouniquekey option can output a dataset that contains obs.2 3 4 6 7. Is there an efficient way to get the final data I want? Thank you.
data have;
input obs id year var1;
cards;
1 1 1999 5
2 2 2000 10
3 2 2000 8
4 2 2000 6
5 3 2001 7
6 4 2002 12
7 4 2002 15
8 5 2001 9
9 6 2001 4
10 7 2002 3
;
proc sql;
create table want as
select *
from have
group by id, year
having count(*)=1;
quit;
data have;
input obs id year var1;
cards;
1 1 1999 5
2 2 2000 10
3 2 2000 8
4 2 2000 6
5 3 2001 7
6 4 2002 12
7 4 2002 15
8 5 2001 9
9 6 2001 4
10 7 2002 3
;
proc sql;
create table want as
select *
from have
group by id, year
having count(*)=1;
quit;
Another approach:
proc sort data=have
out=want
nouniquekey uniqueout=deleted;
by id year;
run;
the Nouniquekey is the operative part. This also demonstrates the Uniqueout option to write those deleted into a separate data set if needed.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.