Extract Name

Accepted Solution Solved
Reply
Contributor
Posts: 29
Accepted Solution

Extract Name

[ Edited ]

Hi,

 

Could you please help me extract first name of full name?

 

Data test;

input id fullname $;

1 Michael R. Boyce

2 James G. Brocksmith, Jr.

3 Gerald F. Fitzgerald, Jr.

4 Norman R. Bobins, BS, MBA

5 Peter Pace, USMC, Ret.

6 Ryan, Norman A.

7 Anne Newman Foreman

;

run;

 

I need to create "first_name" column with middle name and without middle name and "last_name" using the fullname information.

As you can see above, In ID=6, the full name start with the last name unlike other observations. If it is the case, I need to re-order it to be consistent with others.

Like...

 

Data test; set test;

input ID First_name_with_middle $ Last_name $ First_name_without_middle $ 

datalines;

1 Michael R. Boyce Michael

2 James G. Brocksmith James

3 Gerald F. Fitzgerald Gerald

4 Norman R. Bobins Norman

5 Peter Pace Peter 

6 Norman A. Ryan Norman

7 Anne Newman Foreman Anne 

;

run;

 

 

That is, the first name with middle name column should contain the middle name abbreviation as well.

And, the last name column only contains last name of the full name.

Please help me!

 


Accepted Solutions
Solution
‎03-15-2018 01:48 AM
Esteemed Advisor
Posts: 5,540

Re: Extract Name

PRX to the rescue...

 

Data test;
input id fullname $64.;
datalines;
1 Michael R. Boyce
2 James G. Brocksmith, Jr.
3 Gerald F. Fitzgerald, Jr.
4 Norman R. Bobins, BS, MBA
5 Peter Pace, USMC, Ret.
6 Ryan, Norman A.
7 Anne Newman Foreman
8 Michael R.S. Boyce
9 Ryan, Norman A., BS
;

data want;
length firstName lastName $20 initials $6;
if not prxId1 then prxId1 + prxParse("/^(\w+)\s+(([A-Z]\.)*\s*)([^,]+)/");
if not prxId2 then prxId2 + prxParse("/^(\w+),\s*(\w+)\s+(([A-Z]\.)*)/");
set test;
if prxMatch(prxId1, fullname) then do;
    firstName = prxPosn(prxId1, 1, fullname);
    initials = prxPosn(prxId1, 2, fullname);
    lastName = prxPosn(prxId1, 4, fullname);
    end;
else if prxMatch(prxId2, fullname) then do;
    firstName = prxPosn(prxId2, 2, fullname);
    initials = prxPosn(prxId2, 3, fullname);
    lastName = prxPosn(prxId2, 1, fullname);
    end;
drop prxId: ;
run;

proc print data=want noobs; run;
      first
      Name       lastName          initials    id    fullname

      Michael    Boyce               R.         1    Michael R. Boyce
      James      Brocksmith          G.         2    James G. Brocksmith, Jr.
      Gerald     Fitzgerald          F.         3    Gerald F. Fitzgerald, Jr.
      Norman     Bobins              R.         4    Norman R. Bobins, BS, MBA
      Peter      Pace                           5    Peter Pace, USMC, Ret.
      Norman     Ryan                A.         6    Ryan, Norman A.
      Anne       Newman Foreman                 7    Anne Newman Foreman
      Michael    Boyce               R.S.       8    Michael R.S. Boyce
      Norman     Ryan                A.         9    Ryan, Norman A., BS
PG

View solution in original post


All Replies
PROC Star
Posts: 514

Re: Extract Name

I just got confused with variables you want in your output dataset

Contributor
Posts: 29

Re: Extract Name

Sorry. The wanted dataset should look like..

Data test;
infile datalines DSD;
input ID First_name_with_middle $ Last_name $ First_name_without_middle $ ;
datalines;
1, Michael R., Boyce, Michael
2, James G., Brocksmith, James
3, Gerald F., Fitzgerald, Gerald
4, Norman R., Bobins, Norman
5, Peter, Pace, Peter
6, Norman A., Ryan, Norman
7, Anne Newman, Foreman, Anne
;

run;

 

But, when I ran the code above, First_name_with_middle column shows that some observations do not have a dot after middle name abbreviation. I do not know why... but I need the dot after the middle name abbreviation if there is a middle name abbreviation.

Solution
‎03-15-2018 01:48 AM
Esteemed Advisor
Posts: 5,540

Re: Extract Name

PRX to the rescue...

 

Data test;
input id fullname $64.;
datalines;
1 Michael R. Boyce
2 James G. Brocksmith, Jr.
3 Gerald F. Fitzgerald, Jr.
4 Norman R. Bobins, BS, MBA
5 Peter Pace, USMC, Ret.
6 Ryan, Norman A.
7 Anne Newman Foreman
8 Michael R.S. Boyce
9 Ryan, Norman A., BS
;

data want;
length firstName lastName $20 initials $6;
if not prxId1 then prxId1 + prxParse("/^(\w+)\s+(([A-Z]\.)*\s*)([^,]+)/");
if not prxId2 then prxId2 + prxParse("/^(\w+),\s*(\w+)\s+(([A-Z]\.)*)/");
set test;
if prxMatch(prxId1, fullname) then do;
    firstName = prxPosn(prxId1, 1, fullname);
    initials = prxPosn(prxId1, 2, fullname);
    lastName = prxPosn(prxId1, 4, fullname);
    end;
else if prxMatch(prxId2, fullname) then do;
    firstName = prxPosn(prxId2, 2, fullname);
    initials = prxPosn(prxId2, 3, fullname);
    lastName = prxPosn(prxId2, 1, fullname);
    end;
drop prxId: ;
run;

proc print data=want noobs; run;
      first
      Name       lastName          initials    id    fullname

      Michael    Boyce               R.         1    Michael R. Boyce
      James      Brocksmith          G.         2    James G. Brocksmith, Jr.
      Gerald     Fitzgerald          F.         3    Gerald F. Fitzgerald, Jr.
      Norman     Bobins              R.         4    Norman R. Bobins, BS, MBA
      Peter      Pace                           5    Peter Pace, USMC, Ret.
      Norman     Ryan                A.         6    Ryan, Norman A.
      Anne       Newman Foreman                 7    Anne Newman Foreman
      Michael    Boyce               R.S.       8    Michael R.S. Boyce
      Norman     Ryan                A.         9    Ryan, Norman A., BS
PG
Contributor
Posts: 29

Re: Extract Name

Thank you very much!
Contributor
Posts: 29

Re: Extract Name

Just one more quick question.

 

If I have the fullname.. like...

Lavizzo-Mourey, Evans

B. Evan Bayh, III

 

 

Those names are not correctly divided into firstname and lastname using the above code.

 

Thank you!

 

Esteemed Advisor
Posts: 5,540

Re: Extract Name

The possibilities are endless... You can account for hyphenated names and firstnames easily, but frankly I don't even know what's what in "B. Evan Bayh, III". If such "names" are rare, you might be better sending them to a separate table and treating them by hand...

 

Data test;
input id fullname $64.;
datalines;
1 Michael R. Boyce
2 James G. Brocksmith, Jr.
3 Gerald F. Fitzgerald, Jr.
4 Norman R. Bobins, BS, MBA
5 Peter Pace, USMC, Ret.
6 Ryan, Norman A.
7 Anne Newman Foreman
8 Michael R.S. Boyce
9 Ryan, Norman A., BS
10 Lavizzo-Mourey, Evans
11 B. Evan Bayh, III
;

data want except(keep=id fullname);
length firstName lastName $20 initials $6;
if not prxId1 then prxId1 + prxParse("/^([\w-]+)\s+(([A-Z]\.)*\s*)([^,]+)/");
if not prxId2 then prxId2 + prxParse("/^([\w-]+),\s*([\w-]+)\s+(([A-Z]\.)*)/");
set test;
if prxMatch(prxId1, fullname) then do;
    firstName = prxPosn(prxId1, 1, fullname);
    initials = prxPosn(prxId1, 2, fullname);
    lastName = prxPosn(prxId1, 4, fullname);
    output want;
    end;
else if prxMatch(prxId2, fullname) then do;
    firstName = prxPosn(prxId2, 2, fullname);
    initials = prxPosn(prxId2, 3, fullname);
    lastName = prxPosn(prxId2, 1, fullname);
    output want;
    end;
else output except;
drop prxId: ;
run;

title "Matched names";
proc print data=want noobs; run;

title "Unmatched names";
proc print data=except noobs; run;
PG
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 175 views
  • 0 likes
  • 3 in conversation