Solved: Re: Remove middle initial from LastName, FirstName M

BlayLay · Posted 01-11-2021 01:54 PM

Hi All,

I'm trying to remove middle initials at the end of a name IF it has a middle initial (some names don't have middle initial in the dataset).

What I have:

Name

Doe, John M.

Doe, Jane

What I'd like:

Name

Doe, John

Doe, Jane

I've seen some different approaches, and it seems as though a substr(scan()) may be the best approach, I'm just not sure how to apply it.

I tried the below code, but it removed everything after the first space.

Name = substr(Name,1,length(Name) - length(scan(Name,-1,' ')));

novinosrin · Posted 01-11-2021 02:06 PM


data have;
input Name $20.;
cards;
Doe, John M.
Doe, Jane
;

data want;
 set have;
 length want $20;
 if countw(name,' ')=3 then want=substr(name,1,anyspace(strip(name),-99));
 else want=name;
run;

View solution in original post

novinosrin · Posted 01-11-2021 02:06 PM


data have;
input Name $20.;
cards;
Doe, John M.
Doe, Jane
;

data want;
 set have;
 length want $20;
 if countw(name,' ')=3 then want=substr(name,1,anyspace(strip(name),-99));
 else want=name;
run;

novinosrin · Posted 01-11-2021 02:40 PM

Or little cheeky-------------


data have;
input Name $20.;
cards;
Doe, John M.
Doe, Jane
;

data want;
 set have;
 length want $20;
 want=catx(' ',scan(name,1,' '),scan(name,2,' '));
run;

BlayLay · Posted 01-11-2021 02:43 PM

@novinosrin

Appreciate the options, which one would you recommend from a performance/stability standpoint? They both seem to work, so wasn't sure if you had a preference based on your experience.

novinosrin · Posted 01-11-2021 03:05 PM

Hi @BlayLay Both approaches follow the logic-

1. Check how many words are there in the name value.

2. If there are 3 words(assuming the 3rd one is he middle initial to be ignored), just extract the 1st 2. Otherwise just take the name value as is.

3. The 1st one uses SUBSTR to extract from position 1 up until the last delimiter ' ' that separates the words

4. The 2nd approach basically extracts 1st word and 2nd word and concatenates into 1. A lazy approach so to speak.

It's up to you and your convenience.

ballardw · Posted 01-11-2021 03:30 PM

@BlayLay wrote:
@novinosrin

Appreciate the options, which one would you recommend from a performance/stability standpoint? They both seem to work, so wasn't sure if you had a preference based on your experience.

When it comes to names stored in a single variable I don't trust much of anything unless someone that is involved with creating the data promises, and will stand by their statement later, that "all the names are last, first with optional middle initial".

I have received data with names that in a single file were:

First name, last name, middle initial

First name, last name, other parent's last name, middle name

Last name, first name, other parent's last name, middle name

Last name, other parent's last name, first name

Last name -(hyphen character) other parent's last name, first name

Sprinkle in random Junior, or "the Second" after any of the first or last names

And one name like: Moon Beam Sun Child

If you have occasional other elements like titles, "Dr." "Esq." you may have more fun with getting the right pieces together depending on order and consistency.

Classroom Training Available!