Solved
Contributor
Posts: 58

# Drop duplicate and original

Folks,

I've been searching online for a solution to my query, but have yet to find one. In my orginal dataset I had a number of duplicate individuals (however, the information contained within the variables were different). Thus, I originally took out the two columns of information on the individual and aggregated it into one row.

However, now I would like to merge it back into my original dataset but firstly I would like to drop all duplicates along with the original. For ease here is a visual example.

id  x1 x2 x3

1   5   6   5

1  10  2   3

2   6   5   85

3 63   2   11

4 4   68    95

id x1 x2 x3

2 6    5   85

3 63  2    11

4 4    68   95

Accepted Solutions
Solution
‎01-04-2017 07:32 AM
Occasional Contributor
Posts: 8

## Re: Drop duplicate and original

Absolutely right, missed that logic. Corrected example with enhanced test data:

``````data dupremove;
input id x1 x2 x3;
datalines;
2 6 5 85
1 5 6 5
3 4 7 9
9 9 8 8
3 8 8 9
3 8 9 9
4 9 23 234
1 10 2 3
23 5 42 3
3 63 2 11
4 4 68 95
;
run;

proc sort data=dupremove;
by id;
run;

data dupremove;
set dupremove;
by id;

if first.id=0 or last.id=0 then
delete;
run;

proc print data=dupremove;
run;``````

All Replies
Occasional Contributor
Posts: 8

## Re: Drop duplicate and original

``````data dupremove;
input id x1 x2 x3;
datalines;
2 6 5 85
1 5 6 5
3 4 7 9
9 9 8 8
4 9 23 234
1 10 2 3
23 5 42 3
3 63 2 11
4 4 68 95
;
run;

proc sort data=dupremove;
by id;
run;

data dupremove;
set dupremove;
by id;

if not first.id=last.id then
delete;
run;

proc print data=dupremove;
run;``````

Here you are!

The trick after sorting  is the "if not first.id=last.id then delete;" line. This only outputs lines where only a unique id exists. If the first.id and the last.id variable are not 1 it means there is more than one line with the same id. Note you have to have the "by id" in the data step.

Super User
Posts: 6,899

## Re: Drop duplicate and original

I'm afraid that solution won't work when you start with 3 or more observations for the same ID.  It's simpler and more reliable to code:

if first.id=0 or last.id=0 then delete;

If you feel more clever than that, you can try:

if not first.id * last.id;

Solution
‎01-04-2017 07:32 AM
Occasional Contributor
Posts: 8

## Re: Drop duplicate and original

Absolutely right, missed that logic. Corrected example with enhanced test data:

``````data dupremove;
input id x1 x2 x3;
datalines;
2 6 5 85
1 5 6 5
3 4 7 9
9 9 8 8
3 8 8 9
3 8 9 9
4 9 23 234
1 10 2 3
23 5 42 3
3 63 2 11
4 4 68 95
;
run;

proc sort data=dupremove;
by id;
run;

data dupremove;
set dupremove;
by id;

if first.id=0 or last.id=0 then
delete;
run;

proc print data=dupremove;
run;``````
☑ This topic is solved.