Based on duplicate Names and Addresses, I had identified duplicate observations (some individuals are repeated more than twice) and had created a subset of my dataset.
Each row of my dataset has unique ID values. I want to get the list of the ID values associated with 2nd, 3rd, 4th, ... repetitions of the DUP_OBS to use it in my original dataset and say remove the rows where ID in (the list of ID values)
DATA WANT;
SET HAVE;
WHERE ID NOT IN ("313086590", "110616230", "570208047", ..., "359502732");
RUN; *my ID variable is defined as a CHAR variable;
Eventually, I want to remove the bold rows in the example table below.
How may I identify these ID values in my data?
Thank you for your help.
ID | DUP_OBS |
165577390 | 1 |
313086590 | 1 |
181429833 | 2 |
165924338 | 2 |
110616230 | 2 |
520386083 | 3 |
169961292 | 3 |
570208047 | 3 |
521476491 | 3 |
393097972 | 4 |
359502732 | 4 |
This gets you what you want, but I'm not sure how you're sorting before this step:
proc sort
data = have
out = have_nodup nodupkey
dupout = have_dup;
by dup_obs;
run;
id dup_obs 165577390 1 181429833 2 520386083 3 393097972 4
This gets you what you want, but I'm not sure how you're sorting before this step:
proc sort
data = have
out = have_nodup nodupkey
dupout = have_dup;
by dup_obs;
run;
id dup_obs 165577390 1 181429833 2 520386083 3 393097972 4
Thank you @maguiremq .
I used a similar method to get to the subset of data with duplicate names and addresses. I need to repeat the step as you suggested. The have_dup dataset would give me the ID values I need to remove from my original data. Thanks
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.