BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
BlayLay
Obsidian | Level 7

Hi All,

 

I'm trying to remove middle initials at the end of a name IF it has a middle initial (some names don't have middle initial in the dataset).

 

What I have:

Name
Doe, John M.
Doe, Jane

 

What I'd like:

Name
Doe, John
Doe, Jane

 

I've seen some different approaches, and it seems as though a substr(scan()) may be the best approach, I'm just not sure how to apply it.

 

I tried the below code, but it removed everything after the first space.

Name = substr(Name,1,length(Name) - length(scan(Name,-1,' ')));

 

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

data have;
input Name $20.;
cards;
Doe, John M.
Doe, Jane
;

data want;
 set have;
 length want $20;
 if countw(name,' ')=3 then want=substr(name,1,anyspace(strip(name),-99));
 else want=name;
run;

View solution in original post

5 REPLIES 5
novinosrin
Tourmaline | Level 20

data have;
input Name $20.;
cards;
Doe, John M.
Doe, Jane
;

data want;
 set have;
 length want $20;
 if countw(name,' ')=3 then want=substr(name,1,anyspace(strip(name),-99));
 else want=name;
run;

novinosrin
Tourmaline | Level 20

Or little cheeky-------------


data have;
input Name $20.;
cards;
Doe, John M.
Doe, Jane
;

data want;
 set have;
 length want $20;
 want=catx(' ',scan(name,1,' '),scan(name,2,' '));
run;
BlayLay
Obsidian | Level 7
@novinosrin

Appreciate the options, which one would you recommend from a performance/stability standpoint? They both seem to work, so wasn't sure if you had a preference based on your experience.
novinosrin
Tourmaline | Level 20

Hi @BlayLay  Both approaches follow the logic-

1. Check how many words are there in the name value.

2. If there are 3 words(assuming the 3rd one is he middle initial to be ignored), just extract the 1st 2. Otherwise just take the name value as is.

3. The 1st one uses SUBSTR to extract from position 1 up until the last delimiter ' ' that separates the words

4. The 2nd approach basically extracts 1st word and 2nd word and concatenates into 1. A lazy approach so to speak. 

 

It's up to you and your convenience. 

ballardw
Super User

@BlayLay wrote:
@novinosrin

Appreciate the options, which one would you recommend from a performance/stability standpoint? They both seem to work, so wasn't sure if you had a preference based on your experience.

When it comes to names stored in a single variable I don't trust much of anything unless someone that is involved with creating the data promises, and will stand by their statement later, that "all the names are last, first with optional middle initial".

 

I have received data with names that in a single file were:

First name, last name, middle initial

First name, last name, other parent's last name, middle name

Last name, first name, other parent's last name, middle name

Last name, other parent's last name, first name

Last name -(hyphen character) other parent's last name, first name

Sprinkle in random Junior, or "the Second" after any of the first or last names

 

And one name like: Moon Beam Sun Child

 

If you have occasional other elements like titles, "Dr." "Esq." you may have more fun with getting the right pieces together depending on order and consistency.

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 3789 views
  • 4 likes
  • 3 in conversation