SAS Programming

Justin9 · Posted 07-04-2022 05:58 PM

I have a variable called 'FULL_NAME'. which contains a person's full name. However, I have been asked to create the following variables:

-Title (e.g. Mr, Ms etc.)

-Forename

-Middle_Names

-Full_Middle_Initial (all of the middle name initials in full)

-Reduced_Middle_Initial (just the first initial of any middle names e.g. if Peter, Owen and Dennis are the three middle names, this variable would just contain the initial of of the first middle name: P)
-Surname

For example. if there is a person called "Mr James Peter Owen Dennis Coyle-Harper", what code is best to

get the following results for the variables I have listed above? The desired outcome in my example would work for my all of my 100,000 accounts:

-Title: Mr

-Forename: James

-Middle_Names: Peter Owen Dennis

-Full_Middle_Initial: P O D

-Reduced_Middle_Initial: P
-Surname: Coyle-Harper

I have currently got code that uses the scan function, but I am not sure what the best method would be to identify the middle names as there could be multiple, and the surname would just be the last word from the 'FULL_NAME' variable.

data test;
        set Name_Information;
        Title=scan(FULL_NAME,1,' ');
        Forename=scan(FULL_NAME,2,' ');
        Middle_Names=???
        Full_Middle_Initial=???
        Reduced_Middle_Initial=???
        Surname=???
run;

Reeza · Posted 07-04-2022 06:21 PM

Hint: You probably want to use COUNTW() to determine the number of words and differing number of words will have different scenario's most likely.

Justin9 · Posted 07-04-2022 06:39 PM

If someone has the code that I could use in order to get my desired outcome/provide the appropriate code where I have put question marks in the creation of the variables, that would be greatly appreciated!

Reeza · Posted 07-04-2022 06:40 PM

It isn't a simple as that, you need to consider several different variations of possibilities and account for them all. What happens if the name is Mr. John Smith? Or Mrs. Melissa A Adams? There are several different variations you need to account for. That being said, this is a common homework/job application question which is asked on here often. If you search you'll find examples.

SASKiwi · Posted 07-04-2022 07:12 PM

Parsing full names into name components is hard. How many names do you have and are they reasonably 'clean' or not? What result quality is expected? Please bear in mind that surnames can be multiple words and names can contain titles and suffixes. Logic to deal with this will quickly get very complicated.

SAS Data Quality contains intelligent name parsing routines to deal with this but I doubt you have that available to you.

If a 90% job is good enough and you haven't got millions of names then you might get lucky and do the outliers manually.

Justin9 · Posted 07-04-2022 07:23 PM

Hi, I've got about 100,000 accounts. From a quick glance, most only have a title, forename and surname, but some have one or a few middle names.

SASKiwi · Posted 07-04-2022 08:19 PM

I suspect you are in luck then as the cleaner and more consistent the data is the easer it is to process. I suggest you start from the left and see if you can identify all titles. Something like this should work:

data want;
  set have;
  if scan(upcase(full_name),1) in ('MR','MS','MISS') then Title = scan(full_name, 1);
run;

Then check what you have left out and what is selected but shouldn't be.

SAS Programming

Identify title, first name, middle name(s), middle name(s) initial(s) and surname

Re: Identify title, first name, middle name(s), middle name(s) initial(s) and surname

Re: Identify title, first name, middle name(s), middle name(s) initial(s) and surname

Re: Identify title, first name, middle name(s), middle name(s) initial(s) and surname

Re: Identify title, first name, middle name(s), middle name(s) initial(s) and surname

Re: Identify title, first name, middle name(s), middle name(s) initial(s) and surname

Re: Identify title, first name, middle name(s), middle name(s) initial(s) and surname

Follow Us

What is...

SAS Programming

Identify title, first name, middle name(s), middle name(s) initial(s) and surname

Re: Identify title, first name, middle name(s), middle name(s) initial(s) and surname

Re: Identify title, first name, middle name(s), middle name(s) initial(s) and surname

Re: Identify title, first name, middle name(s), middle name(s) initial(s) and surname

Re: Identify title, first name, middle name(s), middle name(s) initial(s) and surname

Re: Identify title, first name, middle name(s), middle name(s) initial(s) and surname

Re: Identify title, first name, middle name(s), middle name(s) initial(s) and surname

Our biggest data and AI event of the year.

SAS Training: Just a Click Away

Follow Us

What is...