Desktop productivity for business analysts and programmers

Drop duplicate and original

Accepted Solution Solved
Reply
Contributor
Posts: 44
Accepted Solution

Drop duplicate and original

Folks,

 

I've been searching online for a solution to my query, but have yet to find one. In my orginal dataset I had a number of duplicate individuals (however, the information contained within the variables were different). Thus, I originally took out the two columns of information on the individual and aggregated it into one row. 

 

However, now I would like to merge it back into my original dataset but firstly I would like to drop all duplicates along with the original. For ease here is a visual example.

 

id  x1 x2 x3

1   5   6   5

1  10  2   3

2   6   5   85

3 63   2   11

4 4   68    95

 

 

id x1 x2 x3

2 6    5   85

3 63  2    11

4 4    68   95

 

Any advice is welcome.

 


Accepted Solutions
Solution
‎01-04-2017 07:32 AM
Occasional Contributor
Posts: 7

Re: Drop duplicate and original

Absolutely right, missed that logic. Corrected example with enhanced test data:

 

data dupremove;
	input id x1 x2 x3;
	datalines;
2 6 5 85
1 5 6 5
3 4 7 9
9 9 8 8
3 8 8 9
3 8 9 9
4 9 23 234
1 10 2 3
23 5 42 3
3 63 2 11
4 4 68 95
;
run;

proc sort data=dupremove;
	by id;
run;

data dupremove;
	set dupremove;
	by id;

if first.id=0 or last.id=0 then
		delete;
run;

proc print data=dupremove;
run;

View solution in original post


All Replies
Occasional Contributor
Posts: 7

Re: Drop duplicate and original

data dupremove;
	input id x1 x2 x3;
	datalines;
2 6 5 85
1 5 6 5
3 4 7 9
9 9 8 8
4 9 23 234
1 10 2 3
23 5 42 3
3 63 2 11
4 4 68 95
;
run;

proc sort data=dupremove;
	by id;
run;

data dupremove;
	set dupremove;
	by id;

	if not first.id=last.id then
		delete;
run;

proc print data=dupremove;
run;

Here you are!

The trick after sorting  is the "if not first.id=last.id then delete;" line. This only outputs lines where only a unique id exists. If the first.id and the last.id variable are not 1 it means there is more than one line with the same id. Note you have to have the "by id" in the data step.

Super User
Posts: 5,362

Re: Drop duplicate and original

I'm afraid that solution won't work when you start with 3 or more observations for the same ID.  It's simpler and more reliable to code:

 

if first.id=0 or last.id=0 then delete;

 

If you feel more clever than that, you can try:

 

if not first.id * last.id;

Solution
‎01-04-2017 07:32 AM
Occasional Contributor
Posts: 7

Re: Drop duplicate and original

Absolutely right, missed that logic. Corrected example with enhanced test data:

 

data dupremove;
	input id x1 x2 x3;
	datalines;
2 6 5 85
1 5 6 5
3 4 7 9
9 9 8 8
3 8 8 9
3 8 9 9
4 9 23 234
1 10 2 3
23 5 42 3
3 63 2 11
4 4 68 95
;
run;

proc sort data=dupremove;
	by id;
run;

data dupremove;
	set dupremove;
	by id;

if first.id=0 or last.id=0 then
		delete;
run;

proc print data=dupremove;
run;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 234 views
  • 1 like
  • 3 in conversation