Using SAS 9.4
What is the best method for splitting patient name into first name and last name if some of the patients are a "jr" or "IV"?
The patient list does not have any commas to separate the data, it is written
"John Smith"
"John Smith Jr"
"Jane Smith"
"Jane Smith Sr"
"John Smith IV"
"John Smith III"
etc.
My code
DATA want;
SET have;
CALL SCAN(PATIENT_NAME, -1, POSITION, LENGTH);
FIRST_NAME = SUBSTR(PATIENT_NAME, 1, POSITION-1);
LAST_NAME = SCAN(PATIENT_NAME, -1, ' ');
KEEP LAST_NAME FIRST_NAME DOB GENDER;
FORMAT FIRST_NAME LAST_NAME $100.;
RUN;
The problem is this code leaves the "Jr" "Sr" "IV" in the newly created last_name variable
Thanks for any help
Hello @GS2 Are you asking for this by any chance?
data have;
input PATIENT_NAME $quote50.;
cards;
"John Smith"
"John Smith Jr"
"Jane Smith"
"Jane Smith Sr"
"John Smith IV"
"John Smith III"
;
data want;
set have;
firstname=substr(PATIENT_NAME,1,anyspace(PATIENT_NAME));
lastname=substr(PATIENT_NAME,anyspace(PATIENT_NAME));
RUN;
Do you want Jr and IV to be part of the last name?
Yes. I would like it to read
first_name "John"
last_name "Smith Jr"
What would you do about a name like
Jerry Van Arsdale
or
Billy Bob Thornton?
In one case, Van Arsdale (words 2 and 3) are the last name; in the other case, Billy Bob (words 1 and 2) is the first name.
Now, in your simple John Smith or Jane Smith cases, removing Jr Sr IV III is easy. The problem is much harder in general, and there's probably no solution that will work on every name in the list.
What is the best method for splitting patient name into first name and last name if some of the patients are a "jr" or "IV"?
data want;
set have;
firstname=scan(patient_name,1,' ');
secondname=scan(patient_name,2,' ');
suffix=scan(patient_name,3,' ');
run;
I hesitate to call this the "best" way because there are many cases where it won't work.
Hello @GS2 Are you asking for this by any chance?
data have;
input PATIENT_NAME $quote50.;
cards;
"John Smith"
"John Smith Jr"
"Jane Smith"
"Jane Smith Sr"
"John Smith IV"
"John Smith III"
;
data want;
set have;
firstname=substr(PATIENT_NAME,1,anyspace(PATIENT_NAME));
lastname=substr(PATIENT_NAME,anyspace(PATIENT_NAME));
RUN;
I would suggest going to the data source and recommending that they bring the data collection standards up to something resembling at least 1990's standards with separate first and last name fields, better would include middle name or initial, prefix, for those folks that insist their name is Doctor, DR, Esq., Esquire, Senator, Representative or other professional indicator, and suffix for the Jr, II , III, IV, 'the second', and similar.
Having dealt with some of this issue and names that looked amazing like
Moon Unit Flower Child
and some of the non-English constructs like
Juan Garcia de Castille y Lumbago
you will likely never get a single rule that works.
I would suggest adding a COUNTW to get the number of space delimited items and then filtering on those that are 3 or more words for further manual examination to verify that your code performed as needed.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.