Solved: Extracting the ID values to remove after identifying duplicate rows

Emma_at_SAS · Posted 12-09-2021 10:12 AM

Based on duplicate Names and Addresses, I had identified duplicate observations (some individuals are repeated more than twice) and had created a subset of my dataset.

Each row of my dataset has unique ID values. I want to get the list of the ID values associated with 2nd, 3rd, 4th, ... repetitions of the DUP_OBS to use it in my original dataset and say remove the rows where ID in (the list of ID values)

DATA WANT;

SET HAVE;

WHERE ID NOT IN ("313086590", "110616230", "570208047", ..., "359502732");

RUN; *my ID variable is defined as a CHAR variable;

Eventually, I want to remove the bold rows in the example table below.

How may I identify these ID values in my data?

Thank you for your help.

ID	DUP_OBS
165577390	1
313086590	1
181429833	2
165924338	2
110616230	2
520386083	3
169961292	3
570208047	3
521476491	3
393097972	4
359502732	4

maguiremq · Posted 12-09-2021 10:18 AM

This gets you what you want, but I'm not sure how you're sorting before this step:

proc sort
	data = have 
	out = have_nodup nodupkey
	dupout = have_dup;
		by dup_obs;
run;

id dup_obs 
165577390 1 
181429833 2 
520386083 3 
393097972 4

View solution in original post

maguiremq · Posted 12-09-2021 10:18 AM

This gets you what you want, but I'm not sure how you're sorting before this step:

proc sort
	data = have 
	out = have_nodup nodupkey
	dupout = have_dup;
		by dup_obs;
run;

id dup_obs 
165577390 1 
181429833 2 
520386083 3 
393097972 4

Emma_at_SAS · Posted 12-09-2021 10:58 AM

Thank you @maguiremq .

I used a similar method to get to the subset of data with duplicate names and addresses. I need to repeat the step as you suggested. The have_dup dataset would give me the ID values I need to remove from my original data. Thanks

Extracting the ID values to remove after identifying duplicate rows

Re: Extracting the ID values to remove after identifying duplicate rows

Re: Extracting the ID values to remove after identifying duplicate rows

Re: Extracting the ID values to remove after identifying duplicate rows

Registration is open

Extracting the ID values to remove after identifying duplicate rows

Re: Extracting the ID values to remove after identifying duplicate rows

Re: Extracting the ID values to remove after identifying duplicate rows

Re: Extracting the ID values to remove after identifying duplicate rows

Registration is open

SAS Training: Just a Click Away