Hello,
I want to delete all the duplicated observations. Example data is:
obs. | id | year | var1 |
1 | 1 | 1999 | 5 |
2 | 2 | 2000 | 10 |
3 | 2 | 2000 | 8 |
4 | 2 | 2000 | 6 |
5 | 3 | 2001 | 7 |
6 | 4 | 2002 | 12 |
7 | 4 | 2002 | 15 |
8 | 5 | 2001 | 9 |
9 | 6 | 2001 | 4 |
10 | 7 | 2002 | 3 |
A unique observation is determined by id and year, so in the example data, the duplicates are obs. 2 3 4 and obs. 6 7.
I want to delete all the duplicates without keeping one unique observation. So the data I want is:
obs. | id | year | var1 |
1 | 1 | 1999 | 5 |
5 | 3 | 2001 | 7 |
8 | 5 | 2001 | 9 |
9 | 6 | 2001 | 4 |
10 | 7 | 2002 | 3 |
proc sort by id year nodupkey option will keep obs. 2 and obs. 6 and nouniquekey option can output a dataset that contains obs.2 3 4 6 7. Is there an efficient way to get the final data I want? Thank you.
data have;
input obs id year var1;
cards;
1 1 1999 5
2 2 2000 10
3 2 2000 8
4 2 2000 6
5 3 2001 7
6 4 2002 12
7 4 2002 15
8 5 2001 9
9 6 2001 4
10 7 2002 3
;
proc sql;
create table want as
select *
from have
group by id, year
having count(*)=1;
quit;
data have;
input obs id year var1;
cards;
1 1 1999 5
2 2 2000 10
3 2 2000 8
4 2 2000 6
5 3 2001 7
6 4 2002 12
7 4 2002 15
8 5 2001 9
9 6 2001 4
10 7 2002 3
;
proc sql;
create table want as
select *
from have
group by id, year
having count(*)=1;
quit;
Another approach:
proc sort data=have out=want nouniquekey uniqueout=deleted; by id year; run;
the Nouniquekey is the operative part. This also demonstrates the Uniqueout option to write those deleted into a separate data set if needed.
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Select SAS Training centers are offering in-person courses. View upcoming courses for: