BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
amamiche67
Calcite | Level 5

I have a dataset with duplicate names, yet same IDs. It looks like this. I received this dataset as is, so not sure why the names are case sensitive. 

IDFirstNameLastName
001JohnReed
001JOHNREED
002MitchelJames
003StephAnia
004KingMon
004KINGMON
005ValeryShort
012ALMAJACOBS
012AlmaJacobs
017JoshRyan
017JashRyan

I need to identify those with the exact same spelling for firstname and lastname variables to remove the duplicates. I want the final dataset to look like this table below. However, given that I have around 800 rows and the names are duplicates due to being case sensitive (except for ID 017, which is a duplicate due to a typo), I'm unsure of how to approach this. Does anyone have any suggestions and/or solutions?

IDFirstNameLastName
001JohnReed
002MitchelJames
003StephAnia
004KingMon

005

ValeryShort
012AlmaJacobs
017Josh Ryan
017JashRyan
1 ACCEPTED SOLUTION

Accepted Solutions
SASKiwi
PROC Star

Something like this?

data Have;
  input @1 ID $3. @5 FirstName $8. @12 LastName $8.;
  FirstName = propcase(FirstName);
  LastName = propcase(LastName);
datalines;
001 John    Reed
001 JOHN    REED
002 Mitchel James
003 Steph   Ania
004 King    Mon
004 KING    MON
005 Valery  Short
012 ALMA    JACOBS
012 Alma    Jacobs
017 Josh    Ryan
017 Jash    Ryan
;
run;

proc sort data = Have 
          out  = Want nodupkey;
  by ID LastName FirstName;
run;

View solution in original post

2 REPLIES 2
SASKiwi
PROC Star

Something like this?

data Have;
  input @1 ID $3. @5 FirstName $8. @12 LastName $8.;
  FirstName = propcase(FirstName);
  LastName = propcase(LastName);
datalines;
001 John    Reed
001 JOHN    REED
002 Mitchel James
003 Steph   Ania
004 King    Mon
004 KING    MON
005 Valery  Short
012 ALMA    JACOBS
012 Alma    Jacobs
017 Josh    Ryan
017 Jash    Ryan
;
run;

proc sort data = Have 
          out  = Want nodupkey;
  by ID LastName FirstName;
run;
SimonDawson
SAS Employee

SASKiwis idea will work if you don't need to retain the first case in the data. Here is what I cam up with. This uses a hash object so will only scale to the size of the available memory on your system.

 

data have;
  length ID 8 FirstName LastName $8;
  input ID FirstName LastName;
cards;
001 John Reed
001 JOHN REED
002 Mitchel James
003 Steph Ania
004 King Mon
004 KING MON
005 Valery Short
012 ALMA JACOBS
012 Alma Jacobs
017 Josh Ryan
017 Jash Ryan
;
run;

data want;
    if _n_ = 1 then do;
        declare hash h();
        rc = h.defineKey('ID','FirstName_upcase','LastName_upcase');
        rc = h.defineDone();
    end;

    set have;

    FirstName_upcase = upcase(FirstName);
    LastName_upcase = upcase(LastName);

    rc = h.find();
    if (rc ~= 0) then do;
        rc = h.add();
        output;
    end;

  keep ID FirstName LastName;

run;

This will keep the first observation with unique spelling and retain the case. If the varaible was "JoHn" in the first observation the propcase method would make this "John".

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 666 views
  • 2 likes
  • 3 in conversation