Solved: Re: remove duplicates based on subset of observations

Alireza_Boloori · Posted 08-19-2017 04:02 PM

Hello everyone,

I have a data like this:

ID X1 X2

1 1 1

2 1 1

3 1 2

4 1 2

5 2 3

6 2 3

7 2 3

8 2 4

and I want to remove duplicate observations under X2 for every value under X1, which makes the data as such:

ID X1 X2

1 1 1

3 1 2

5 2 3

8 2 4

I was wondering how it can be done in SAS. Any idea/help is really appreciated!

novinosrin · Posted 08-19-2017 06:06 PM

I honestly think you didn't test my code. Well no worries, I did another test for you:-

18 data have;
19 input ID X1 X2;
20 datalines;

NOTE: The data set WORK.HAVE has 8 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds

29 ;
30 proc sort data=have out=want nodupkey;
31 by x1 x2;
32 run;

NOTE: There were 8 observations read from the data set WORK.HAVE.
NOTE: 4 observations with duplicate key values were deleted.
NOTE: The data set WORK.WANT has 4 observations and 3 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds

View solution in original post

novinosrin · Posted 08-19-2017 04:19 PM

Haven't you tried:

Proc sql;

select distinct ID , X1 , X2

from your_table;

quit;

novinosrin · Posted 08-19-2017 04:35 PM

data have;
input ID X1 X2;
datalines;
1 1 1
2 1 1
3 1 2
4 1 2
5 2 3
6 2 3
7 2 3
8 2 4
;

proc sort data=have out=want nodupkey;
by x1 x2;
run;

Alireza_Boloori · Posted 08-19-2017 06:03 PM

@novinosrin Thanks! However, it does not remove the duplicates. I had to add this to it:

data want ;
set want ;
by X1 X2;
if first.X2;
run;

novinosrin · Posted 08-19-2017 06:06 PM

I honestly think you didn't test my code. Well no worries, I did another test for you:-

18 data have;
19 input ID X1 X2;
20 datalines;

NOTE: The data set WORK.HAVE has 8 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds

29 ;
30 proc sort data=have out=want nodupkey;
31 by x1 x2;
32 run;

NOTE: There were 8 observations read from the data set WORK.HAVE.
NOTE: 4 observations with duplicate key values were deleted.
NOTE: The data set WORK.WANT has 4 observations and 3 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds

Alireza_Boloori · Posted 08-19-2017 06:15 PM

Well! My original data was not EXACTLY the same as the one I wrote initially, so it might be the reason for this. Otherwise, I did test your code. Thanks for your time!

Reeza · Posted 08-19-2017 07:56 PM

I think he mixed up the two solutions somehow. @Alireza_Boloori please mark the appropriate answer as correct.

Registration is open

SAS Training: Just a Click Away