BookmarkSubscribeRSS Feed
Justin9
Obsidian | Level 7

I have a variable called 'FULL_NAME'. which contains a person's full name. However, I have been asked to create the following variables:

-Title (e.g. Mr, Ms etc.)

-Forename

-Middle_Names

-Full_Middle_Initial (all of the middle name initials in full)

-Reduced_Middle_Initial (just the first initial of any middle names e.g. if Peter, Owen and Dennis are the three middle names, this variable would just contain the initial of of the first middle name: P)
-Surname

 

For example. if there is a person called "Mr James Peter Owen Dennis Coyle-Harper", what code is best to 

get the following results for the variables I have listed above? The desired outcome in my example would work for my all of my 100,000 accounts:

-Title: Mr

-Forename: James

-Middle_Names: Peter Owen Dennis

-Full_Middle_Initial: P O D

-Reduced_Middle_Initial: P
-Surname: Coyle-Harper

 

I have currently got code that uses the scan function, but I am not sure what the best method would be to identify the middle names as there could be multiple, and the surname would just be the last word from the 'FULL_NAME' variable.

data test;
        set Name_Information;
        Title=scan(FULL_NAME,1,' ');
        Forename=scan(FULL_NAME,2,' ');
        Middle_Names=???
        Full_Middle_Initial=???
        Reduced_Middle_Initial=???
        Surname=???
run;

 

 

6 REPLIES 6
Reeza
Super User
Hint: You probably want to use COUNTW() to determine the number of words and differing number of words will have different scenario's most likely.
Justin9
Obsidian | Level 7

If someone has the code that I could use in order to get my desired outcome/provide the appropriate code where I have put question marks in the creation of the variables, that would be greatly appreciated!

Reeza
Super User

It isn't a simple as that, you need to consider several different variations of possibilities and account for them all. What happens if the name is Mr. John Smith? Or Mrs. Melissa A Adams? There are several different variations you need to account for. That being said, this is a common homework/job application question which is asked on here often. If you search you'll find examples. 

SASKiwi
PROC Star

Parsing full names into name components is hard. How many names do you have and are they reasonably 'clean' or not? What result quality is expected? Please bear in mind that surnames can be multiple words and names can contain titles and suffixes. Logic to deal with this will quickly get very complicated.

 

SAS Data Quality contains intelligent name parsing routines to deal with this but I doubt you have that available to you.

 

If a 90% job is good enough and you haven't got millions of names then you might get lucky and do the outliers manually. 

Justin9
Obsidian | Level 7
Hi, I've got about 100,000 accounts. From a quick glance, most only have a title, forename and surname, but some have one or a few middle names.
SASKiwi
PROC Star

I suspect you are in luck then as the cleaner and more consistent the data is the easer it is to process. I suggest you start from the left and see if you can identify all titles. Something like this should work:

data want;
  set have;
  if scan(upcase(full_name),1) in ('MR','MS','MISS') then Title = scan(full_name, 1);
run;

Then check what you have left out and what is selected but shouldn't be. 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 3524 views
  • 1 like
  • 3 in conversation