Help using Base SAS procedures

Need help with substr

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 96
Accepted Solution

Need help with substr

Dear All,

 

I have a list of fullnames and I want to separate the first name and the last name from the fullname. The names are not in the same format and hence the substr and the scan functions I tried are given me wrong output for some names. The names are as follows:

 

Fullname

H FULLER

WILLIAM D SMITHBURG

BOONE POWELL JR.

W. JAMES FARRELL

SAMUEL C. SCOTT III

DAVID A. L. OWEN, C.H., F.R.C.P.

R. S. AUSTIN

JAMES PACE

Ann McLaughlin Korologos

SUSAN CLARK-JOHNSON

 

The output I desire is as follows:

 

 

Fullname                                                 First_Name              Last_Name                    

H FULLER                                               H                              Fuller

WILLIAM D SMITHBURG                        William                     Smithburg

BOONE POWELL JR.                              Boone                      Powell

W. JAMES FARRELL                              W                              Farrell   

SAMUEL C. SCOTT III                            Samuel                     Scott

DAVID A. L. OWEN, C.H., F.R.C.P.         David                        Owen

R. S. AUSTIN                                           R                               Austin

JAMES PACE                                          James                       Pace

Ann McLaughlin Korologos                     Ann                            Korologos

SUSAN CLARK-JOHNSON                     Susan                        Clark-Johnson

 

I would appreciate if someone share a code with me.

 

Regards,

S


Accepted Solutions
Solution
‎01-20-2016 04:47 PM
Valued Guide
Posts: 858

Re: Need help with substr

Depending on the different formats and how long the list is you'll probably have to full around with this a little to get it right, but it's a good start and works with the current example:

 

data have;
infile cards dsd dlm='***';
informat fullname $35.;
format fullname $35.;
input Fullname;
cards;
H FULLER
WILLIAM D SMITHBURG
BOONE POWELL JR.
W. JAMES FARRELL
SAMUEL C. SCOTT III
DAVID A. L. OWEN, C.H., F.R.C.P.
R. S. AUSTIN
JAMES PACE
Ann McLaughlin Korologos
SUSAN CLARK-JOHNSON
;run;

data want(keep=fullname first_name last_name);
set have;
First_Name=scan(fullname,1,' ');
Last_Name=scan(fullname,2,' ');
if length(last_name) < 3 then Last_Name = scan(fullname,3,' ');
if length(last_name) < 3 then Last_Name = scan(fullname,4,' ');
if length(scan(fullname,3,' ')) > 3 then last_name = scan(fullname,3,' ');
run;

View solution in original post


All Replies
Super User
Posts: 17,784

Re: Need help with substr

Take a look at the scan() function, it will get you closer.
Solution
‎01-20-2016 04:47 PM
Valued Guide
Posts: 858

Re: Need help with substr

Depending on the different formats and how long the list is you'll probably have to full around with this a little to get it right, but it's a good start and works with the current example:

 

data have;
infile cards dsd dlm='***';
informat fullname $35.;
format fullname $35.;
input Fullname;
cards;
H FULLER
WILLIAM D SMITHBURG
BOONE POWELL JR.
W. JAMES FARRELL
SAMUEL C. SCOTT III
DAVID A. L. OWEN, C.H., F.R.C.P.
R. S. AUSTIN
JAMES PACE
Ann McLaughlin Korologos
SUSAN CLARK-JOHNSON
;run;

data want(keep=fullname first_name last_name);
set have;
First_Name=scan(fullname,1,' ');
Last_Name=scan(fullname,2,' ');
if length(last_name) < 3 then Last_Name = scan(fullname,3,' ');
if length(last_name) < 3 then Last_Name = scan(fullname,4,' ');
if length(scan(fullname,3,' ')) > 3 then last_name = scan(fullname,3,' ');
run;

Super User
Posts: 10,483

Re: Need help with substr

Good luck; If your data entry folks weren't told to collect first, last and middle names plus suffixes such as Junior separately there is almost always going to be doubt in the end.

What do you do (using code only) with names like:

 

Harry Paul

Na Satook Jal

Lee Li

Cropper Kambrii Laree

 

 

Super User
Posts: 5,079

Re: Need help with substr

As has been noted, this data is a mess and you will need to make further decisions about additional cases.  Here are just a couple of examples that will cause trouble and require changing the rules:

 

DR. SMITH

PRINCE

LEONARDO DA VINCI

 

But given that you can always afford to take the first word as the first name, here is an approach:

 

data want;

set have;

First_Name = scan(fullname, 1, ' ');

do i=-1 to -5 by -1 until (done='Y');

   Last_Name = scan(fullname, i, ' ');

   if Last_Name not in ('III', 'JR', 'SR', 'MD') and index(Last_Name, '.')=0 then done='Y';

end;

Last_Name = compress(Last_Name, ',');

run;

Super User
Posts: 9,671

Re: Need help with substr

Yes. The data is messed up . You need consider more rules to process such kind of data.

 

 

data have;
input Fullname $80.;
cards;
H FULLER
WILLIAM D SMITHBURG
BOONE POWELL JR.
W. JAMES FARRELL
SAMUEL C. SCOTT III
DAVID A. L. OWEN, C.H., F.R.C.P.
R. S. AUSTIN
JAMES PACE
Ann McLaughlin Korologos
SUSAN CLARK-JOHNSON
;
run;
data want;
 set have;
 length First_Name  Last_Name  $ 20;
 First_Name=scan(Fullname,1,,'ka');
 Last_Name=scan(prxchange('s/[a-z]+\.|\bI+\b/ /i',-1,Fullname),-1,' ,');
run;
Frequent Contributor
Posts: 96

Re: Need help with substr

Thank you all for the quick reply. The codes are very helpful.

s

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 476 views
  • 1 like
  • 6 in conversation