Based on duplicate Names and Addresses, I had identified duplicate observations (some individuals are repeated more than twice) and had created a subset of my dataset.
Each row of my dataset has unique ID values. I want to get the list of the ID values associated with 2nd, 3rd, 4th, ... repetitions of the DUP_OBS to use it in my original dataset and say remove the rows where ID in (the list of ID values)
DATA WANT;
SET HAVE;
WHERE ID NOT IN ("313086590", "110616230", "570208047", ..., "359502732");
RUN; *my ID variable is defined as a CHAR variable;
Eventually, I want to remove the bold rows in the example table below.
How may I identify these ID values in my data?
Thank you for your help.
ID
DUP_OBS
165577390
1
313086590
1
181429833
2
165924338
2
110616230
2
520386083
3
169961292
3
570208047
3
521476491
3
393097972
4
359502732
4
... View more