BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
shalmali
Calcite | Level 5

Dear All,

 

I have a list of fullnames and I want to separate the first name and the last name from the fullname. The names are not in the same format and hence the substr and the scan functions I tried are given me wrong output for some names. The names are as follows:

 

Fullname

H FULLER

WILLIAM D SMITHBURG

BOONE POWELL JR.

W. JAMES FARRELL

SAMUEL C. SCOTT III

DAVID A. L. OWEN, C.H., F.R.C.P.

R. S. AUSTIN

JAMES PACE

Ann McLaughlin Korologos

SUSAN CLARK-JOHNSON

 

The output I desire is as follows:

 

 

Fullname                                                 First_Name              Last_Name                    

H FULLER                                               H                              Fuller

WILLIAM D SMITHBURG                        William                     Smithburg

BOONE POWELL JR.                              Boone                      Powell

W. JAMES FARRELL                              W                              Farrell   

SAMUEL C. SCOTT III                            Samuel                     Scott

DAVID A. L. OWEN, C.H., F.R.C.P.         David                        Owen

R. S. AUSTIN                                           R                               Austin

JAMES PACE                                          James                       Pace

Ann McLaughlin Korologos                     Ann                            Korologos

SUSAN CLARK-JOHNSON                     Susan                        Clark-Johnson

 

I would appreciate if someone share a code with me.

 

Regards,

S

1 ACCEPTED SOLUTION

Accepted Solutions
Steelers_In_DC
Barite | Level 11

Depending on the different formats and how long the list is you'll probably have to full around with this a little to get it right, but it's a good start and works with the current example:

 

data have;
infile cards dsd dlm='***';
informat fullname $35.;
format fullname $35.;
input Fullname;
cards;
H FULLER
WILLIAM D SMITHBURG
BOONE POWELL JR.
W. JAMES FARRELL
SAMUEL C. SCOTT III
DAVID A. L. OWEN, C.H., F.R.C.P.
R. S. AUSTIN
JAMES PACE
Ann McLaughlin Korologos
SUSAN CLARK-JOHNSON
;run;

data want(keep=fullname first_name last_name);
set have;
First_Name=scan(fullname,1,' ');
Last_Name=scan(fullname,2,' ');
if length(last_name) < 3 then Last_Name = scan(fullname,3,' ');
if length(last_name) < 3 then Last_Name = scan(fullname,4,' ');
if length(scan(fullname,3,' ')) > 3 then last_name = scan(fullname,3,' ');
run;

View solution in original post

6 REPLIES 6
Reeza
Super User
Take a look at the scan() function, it will get you closer.
Steelers_In_DC
Barite | Level 11

Depending on the different formats and how long the list is you'll probably have to full around with this a little to get it right, but it's a good start and works with the current example:

 

data have;
infile cards dsd dlm='***';
informat fullname $35.;
format fullname $35.;
input Fullname;
cards;
H FULLER
WILLIAM D SMITHBURG
BOONE POWELL JR.
W. JAMES FARRELL
SAMUEL C. SCOTT III
DAVID A. L. OWEN, C.H., F.R.C.P.
R. S. AUSTIN
JAMES PACE
Ann McLaughlin Korologos
SUSAN CLARK-JOHNSON
;run;

data want(keep=fullname first_name last_name);
set have;
First_Name=scan(fullname,1,' ');
Last_Name=scan(fullname,2,' ');
if length(last_name) < 3 then Last_Name = scan(fullname,3,' ');
if length(last_name) < 3 then Last_Name = scan(fullname,4,' ');
if length(scan(fullname,3,' ')) > 3 then last_name = scan(fullname,3,' ');
run;

ballardw
Super User

Good luck; If your data entry folks weren't told to collect first, last and middle names plus suffixes such as Junior separately there is almost always going to be doubt in the end.

What do you do (using code only) with names like:

 

Harry Paul

Na Satook Jal

Lee Li

Cropper Kambrii Laree

 

 

Astounding
PROC Star

As has been noted, this data is a mess and you will need to make further decisions about additional cases.  Here are just a couple of examples that will cause trouble and require changing the rules:

 

DR. SMITH

PRINCE

LEONARDO DA VINCI

 

But given that you can always afford to take the first word as the first name, here is an approach:

 

data want;

set have;

First_Name = scan(fullname, 1, ' ');

do i=-1 to -5 by -1 until (done='Y');

   Last_Name = scan(fullname, i, ' ');

   if Last_Name not in ('III', 'JR', 'SR', 'MD') and index(Last_Name, '.')=0 then done='Y';

end;

Last_Name = compress(Last_Name, ',');

run;

Ksharp
Super User

Yes. The data is messed up . You need consider more rules to process such kind of data.

 

 

data have;
input Fullname $80.;
cards;
H FULLER
WILLIAM D SMITHBURG
BOONE POWELL JR.
W. JAMES FARRELL
SAMUEL C. SCOTT III
DAVID A. L. OWEN, C.H., F.R.C.P.
R. S. AUSTIN
JAMES PACE
Ann McLaughlin Korologos
SUSAN CLARK-JOHNSON
;
run;
data want;
 set have;
 length First_Name  Last_Name  $ 20;
 First_Name=scan(Fullname,1,,'ka');
 Last_Name=scan(prxchange('s/[a-z]+\.|\bI+\b/ /i',-1,Fullname),-1,' ,');
run;
shalmali
Calcite | Level 5

Thank you all for the quick reply. The codes are very helpful.

s

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1303 views
  • 1 like
  • 6 in conversation