topic Re: remove duplicates based on subset of observations in SAS Studio

remove duplicates based on subset of observations

Alireza_Boloori — Sat, 19 Aug 2017 20:03:01 GMT

Hello everyone,

I have a data like this:

ID X1 X2

1 1 1

2 1 1

3 1 2

4 1 2

5 2 3

6 2 3

7 2 3

8 2 4

and I want to remove duplicate observations under X2 for every value under X1, which makes the data as such:

ID X1 X2

1 1 1

3 1 2

5 2 3

8 2 4

I was wondering how it can be done in SAS. Any idea/help is really appreciated!

Re: remove duplicates based on subset of observations

novinosrin — Sat, 19 Aug 2017 20:19:01 GMT

Haven't you tried:

Proc sql;

select distinct ID , X1 , X2

from your_table;

quit;

Re: remove duplicates based on subset of observations

novinosrin — Sat, 19 Aug 2017 20:35:20 GMT

data have;
input ID X1 X2;
datalines;
1 1 1
2 1 1
3 1 2
4 1 2
5 2 3
6 2 3
7 2 3
8 2 4
;

proc sort data=have out=want nodupkey;
by x1 x2;
run;

Re: remove duplicates based on subset of observations

Alireza_Boloori — Sat, 19 Aug 2017 22:03:43 GMT

@novinosrin Thanks! However, it does not remove the duplicates. I had to add this to it:

data want ;
set want ;
by X1 X2;
if first.X2;
run;

Re: remove duplicates based on subset of observations

novinosrin — Sat, 19 Aug 2017 22:06:24 GMT

I honestly think you didn't test my code. Well no worries, I did another test for you:-

18 data have;
19 input ID X1 X2;
20 datalines;

NOTE: The data set WORK.HAVE has 8 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds

29 ;
30 proc sort data=have out=want nodupkey;
31 by x1 x2;
32 run;

NOTE: There were 8 observations read from the data set WORK.HAVE.
NOTE: 4 observations with duplicate key values were deleted.
NOTE: The data set WORK.WANT has 4 observations and 3 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds

Re: remove duplicates based on subset of observations

Alireza_Boloori — Sat, 19 Aug 2017 22:15:27 GMT

Well! My original data was not EXACTLY the same as the one I wrote initially, so it might be the reason for this. Otherwise, I did test your code. Thanks for your time!

Re: remove duplicates based on subset of observations

Reeza — Sat, 19 Aug 2017 23:56:10 GMT

I think he mixed up the two solutions somehow. @Alireza_Boloori please mark the appropriate answer as correct.