BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
amamiche67
Calcite | Level 5

I have a dataset with duplicate names, yet same IDs. It looks like this. I received this dataset as is, so not sure why the names are case sensitive. 

IDFirstNameLastName
001JohnReed
001JOHNREED
002MitchelJames
003StephAnia
004KingMon
004KINGMON
005ValeryShort
012ALMAJACOBS
012AlmaJacobs
017JoshRyan
017JashRyan

I need to identify those with the exact same spelling for firstname and lastname variables to remove the duplicates. I want the final dataset to look like this table below. However, given that I have around 800 rows and the names are duplicates due to being case sensitive (except for ID 017, which is a duplicate due to a typo), I'm unsure of how to approach this. Does anyone have any suggestions and/or solutions?

IDFirstNameLastName
001JohnReed
002MitchelJames
003StephAnia
004KingMon

005

ValeryShort
012AlmaJacobs
017Josh Ryan
017JashRyan
1 ACCEPTED SOLUTION

Accepted Solutions
SASKiwi
PROC Star

Something like this?

data Have;
  input @1 ID $3. @5 FirstName $8. @12 LastName $8.;
  FirstName = propcase(FirstName);
  LastName = propcase(LastName);
datalines;
001 John    Reed
001 JOHN    REED
002 Mitchel James
003 Steph   Ania
004 King    Mon
004 KING    MON
005 Valery  Short
012 ALMA    JACOBS
012 Alma    Jacobs
017 Josh    Ryan
017 Jash    Ryan
;
run;

proc sort data = Have 
          out  = Want nodupkey;
  by ID LastName FirstName;
run;

View solution in original post

2 REPLIES 2
SASKiwi
PROC Star

Something like this?

data Have;
  input @1 ID $3. @5 FirstName $8. @12 LastName $8.;
  FirstName = propcase(FirstName);
  LastName = propcase(LastName);
datalines;
001 John    Reed
001 JOHN    REED
002 Mitchel James
003 Steph   Ania
004 King    Mon
004 KING    MON
005 Valery  Short
012 ALMA    JACOBS
012 Alma    Jacobs
017 Josh    Ryan
017 Jash    Ryan
;
run;

proc sort data = Have 
          out  = Want nodupkey;
  by ID LastName FirstName;
run;
SimonDawson
SAS Employee

SASKiwis idea will work if you don't need to retain the first case in the data. Here is what I cam up with. This uses a hash object so will only scale to the size of the available memory on your system.

 

data have;
  length ID 8 FirstName LastName $8;
  input ID FirstName LastName;
cards;
001 John Reed
001 JOHN REED
002 Mitchel James
003 Steph Ania
004 King Mon
004 KING MON
005 Valery Short
012 ALMA JACOBS
012 Alma Jacobs
017 Josh Ryan
017 Jash Ryan
;
run;

data want;
    if _n_ = 1 then do;
        declare hash h();
        rc = h.defineKey('ID','FirstName_upcase','LastName_upcase');
        rc = h.defineDone();
    end;

    set have;

    FirstName_upcase = upcase(FirstName);
    LastName_upcase = upcase(LastName);

    rc = h.find();
    if (rc ~= 0) then do;
        rc = h.add();
        output;
    end;

  keep ID FirstName LastName;

run;

This will keep the first observation with unique spelling and retain the case. If the varaible was "JoHn" in the first observation the propcase method would make this "John".

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 918 views
  • 2 likes
  • 3 in conversation