BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
GS2
Obsidian | Level 7 GS2
Obsidian | Level 7

Using SAS 9.4

 

What is the best method for splitting patient name into first name and last name if some of the patients are a "jr"  or "IV"?

 

The patient list does not have any commas to separate the data, it is written

"John Smith"

"John Smith Jr"

"Jane Smith"

"Jane Smith Sr"

"John Smith IV"

"John Smith III"

etc. 

 

My code

DATA want;
SET have;
CALL SCAN(PATIENT_NAME, -1, POSITION, LENGTH);
FIRST_NAME = SUBSTR(PATIENT_NAME, 1, POSITION-1);
LAST_NAME = SCAN(PATIENT_NAME, -1, ' ');
KEEP LAST_NAME FIRST_NAME DOB GENDER;
FORMAT FIRST_NAME LAST_NAME $100.;
RUN;

 

The problem is this code leaves the "Jr" "Sr" "IV" in the newly created last_name variable

 

Thanks for any help

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

Hello @GS2  Are you asking for this by any chance?

 


data have;
input PATIENT_NAME $quote50.;
cards;
"John Smith"
"John Smith Jr"
"Jane Smith"
"Jane Smith Sr"
"John Smith IV"
"John Smith III"
;

data want;
set have;
firstname=substr(PATIENT_NAME,1,anyspace(PATIENT_NAME));
lastname=substr(PATIENT_NAME,anyspace(PATIENT_NAME));
RUN;

View solution in original post

5 REPLIES 5
PeterClemmensen
Tourmaline | Level 20

Do you want Jr and IV to be part of the last name?

GS2
Obsidian | Level 7 GS2
Obsidian | Level 7

Yes. I would like it to read 

first_name "John"

last_name "Smith Jr"

PaigeMiller
Diamond | Level 26

What would you do about a name like

 

Jerry Van Arsdale

 

or 

 

Billy Bob Thornton?

 

In one case, Van Arsdale (words 2 and 3) are the last name; in the other case, Billy Bob (words 1 and 2) is the first name.

 

Now, in your simple John Smith or Jane Smith cases, removing Jr Sr IV III is easy. The problem is much harder in general, and there's probably no solution that will work on every name in the list.

 

What is the best method for splitting patient name into first name and last name if some of the patients are a "jr"  or "IV"?

 

data want;
    set have;
    firstname=scan(patient_name,1,' ');
    secondname=scan(patient_name,2,' ');
    suffix=scan(patient_name,3,' ');
run;

 

I hesitate to call this the "best" way because there are many cases where it won't work.

 

 

 

 

--
Paige Miller
novinosrin
Tourmaline | Level 20

Hello @GS2  Are you asking for this by any chance?

 


data have;
input PATIENT_NAME $quote50.;
cards;
"John Smith"
"John Smith Jr"
"Jane Smith"
"Jane Smith Sr"
"John Smith IV"
"John Smith III"
;

data want;
set have;
firstname=substr(PATIENT_NAME,1,anyspace(PATIENT_NAME));
lastname=substr(PATIENT_NAME,anyspace(PATIENT_NAME));
RUN;

ballardw
Super User

I would suggest going to the data source and recommending that they bring the data collection standards up to something resembling at least 1990's standards with separate first and last name fields, better would include middle name or initial, prefix, for those folks that insist their name is Doctor, DR, Esq., Esquire, Senator, Representative or other professional indicator, and suffix for the Jr, II , III, IV, 'the second', and similar.

 

 

Having dealt with some of this issue and names that looked amazing like

Moon Unit Flower Child

and some of the non-English constructs like

Juan Garcia de Castille y Lumbago

you will likely never get a single rule that works.

 

I would suggest adding a COUNTW to get the number of space delimited items and then filtering on those that are 3 or more words for further manual examination to verify that your code performed as needed.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 3758 views
  • 0 likes
  • 5 in conversation