Dear All,
I have a list of fullnames and I want to separate the first name and the last name from the fullname. The names are not in the same format and hence the substr and the scan functions I tried are given me wrong output for some names. The names are as follows:
Fullname
H FULLER
WILLIAM D SMITHBURG
BOONE POWELL JR.
W. JAMES FARRELL
SAMUEL C. SCOTT III
DAVID A. L. OWEN, C.H., F.R.C.P.
R. S. AUSTIN
JAMES PACE
Ann McLaughlin Korologos
SUSAN CLARK-JOHNSON
The output I desire is as follows:
Fullname First_Name Last_Name
H FULLER H Fuller
WILLIAM D SMITHBURG William Smithburg
BOONE POWELL JR. Boone Powell
W. JAMES FARRELL W Farrell
SAMUEL C. SCOTT III Samuel Scott
DAVID A. L. OWEN, C.H., F.R.C.P. David Owen
R. S. AUSTIN R Austin
JAMES PACE James Pace
Ann McLaughlin Korologos Ann Korologos
SUSAN CLARK-JOHNSON Susan Clark-Johnson
I would appreciate if someone share a code with me.
Regards,
S
Depending on the different formats and how long the list is you'll probably have to full around with this a little to get it right, but it's a good start and works with the current example:
data have;
infile cards dsd dlm='***';
informat fullname $35.;
format fullname $35.;
input Fullname;
cards;
H FULLER
WILLIAM D SMITHBURG
BOONE POWELL JR.
W. JAMES FARRELL
SAMUEL C. SCOTT III
DAVID A. L. OWEN, C.H., F.R.C.P.
R. S. AUSTIN
JAMES PACE
Ann McLaughlin Korologos
SUSAN CLARK-JOHNSON
;run;
data want(keep=fullname first_name last_name);
set have;
First_Name=scan(fullname,1,' ');
Last_Name=scan(fullname,2,' ');
if length(last_name) < 3 then Last_Name = scan(fullname,3,' ');
if length(last_name) < 3 then Last_Name = scan(fullname,4,' ');
if length(scan(fullname,3,' ')) > 3 then last_name = scan(fullname,3,' ');
run;
Depending on the different formats and how long the list is you'll probably have to full around with this a little to get it right, but it's a good start and works with the current example:
data have;
infile cards dsd dlm='***';
informat fullname $35.;
format fullname $35.;
input Fullname;
cards;
H FULLER
WILLIAM D SMITHBURG
BOONE POWELL JR.
W. JAMES FARRELL
SAMUEL C. SCOTT III
DAVID A. L. OWEN, C.H., F.R.C.P.
R. S. AUSTIN
JAMES PACE
Ann McLaughlin Korologos
SUSAN CLARK-JOHNSON
;run;
data want(keep=fullname first_name last_name);
set have;
First_Name=scan(fullname,1,' ');
Last_Name=scan(fullname,2,' ');
if length(last_name) < 3 then Last_Name = scan(fullname,3,' ');
if length(last_name) < 3 then Last_Name = scan(fullname,4,' ');
if length(scan(fullname,3,' ')) > 3 then last_name = scan(fullname,3,' ');
run;
Good luck; If your data entry folks weren't told to collect first, last and middle names plus suffixes such as Junior separately there is almost always going to be doubt in the end.
What do you do (using code only) with names like:
Harry Paul
Na Satook Jal
Lee Li
Cropper Kambrii Laree
As has been noted, this data is a mess and you will need to make further decisions about additional cases. Here are just a couple of examples that will cause trouble and require changing the rules:
DR. SMITH
PRINCE
LEONARDO DA VINCI
But given that you can always afford to take the first word as the first name, here is an approach:
data want;
set have;
First_Name = scan(fullname, 1, ' ');
do i=-1 to -5 by -1 until (done='Y');
Last_Name = scan(fullname, i, ' ');
if Last_Name not in ('III', 'JR', 'SR', 'MD') and index(Last_Name, '.')=0 then done='Y';
end;
Last_Name = compress(Last_Name, ',');
run;
Yes. The data is messed up . You need consider more rules to process such kind of data.
data have;
input Fullname $80.;
cards;
H FULLER
WILLIAM D SMITHBURG
BOONE POWELL JR.
W. JAMES FARRELL
SAMUEL C. SCOTT III
DAVID A. L. OWEN, C.H., F.R.C.P.
R. S. AUSTIN
JAMES PACE
Ann McLaughlin Korologos
SUSAN CLARK-JOHNSON
;
run;
data want;
set have;
length First_Name Last_Name $ 20;
First_Name=scan(Fullname,1,,'ka');
Last_Name=scan(prxchange('s/[a-z]+\.|\bI+\b/ /i',-1,Fullname),-1,' ,');
run;
Thank you all for the quick reply. The codes are very helpful.
s
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.