BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
hkim3677
Calcite | Level 5

Hi,

 

Could you please help me extract first name of full name?

 

Data test;

input id fullname $;

1 Michael R. Boyce

2 James G. Brocksmith, Jr.

3 Gerald F. Fitzgerald, Jr.

4 Norman R. Bobins, BS, MBA

5 Peter Pace, USMC, Ret.

6 Ryan, Norman A.

7 Anne Newman Foreman

;

run;

 

I need to create "first_name" column with middle name and without middle name and "last_name" using the fullname information.

As you can see above, In ID=6, the full name start with the last name unlike other observations. If it is the case, I need to re-order it to be consistent with others.

Like...

 

Data test; set test;

input ID First_name_with_middle $ Last_name $ First_name_without_middle $ 

datalines;

1 Michael R. Boyce Michael

2 James G. Brocksmith James

3 Gerald F. Fitzgerald Gerald

4 Norman R. Bobins Norman

5 Peter Pace Peter 

6 Norman A. Ryan Norman

7 Anne Newman Foreman Anne 

;

run;

 

 

That is, the first name with middle name column should contain the middle name abbreviation as well.

And, the last name column only contains last name of the full name.

Please help me!

 

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

PRX to the rescue...

 

Data test;
input id fullname $64.;
datalines;
1 Michael R. Boyce
2 James G. Brocksmith, Jr.
3 Gerald F. Fitzgerald, Jr.
4 Norman R. Bobins, BS, MBA
5 Peter Pace, USMC, Ret.
6 Ryan, Norman A.
7 Anne Newman Foreman
8 Michael R.S. Boyce
9 Ryan, Norman A., BS
;

data want;
length firstName lastName $20 initials $6;
if not prxId1 then prxId1 + prxParse("/^(\w+)\s+(([A-Z]\.)*\s*)([^,]+)/");
if not prxId2 then prxId2 + prxParse("/^(\w+),\s*(\w+)\s+(([A-Z]\.)*)/");
set test;
if prxMatch(prxId1, fullname) then do;
    firstName = prxPosn(prxId1, 1, fullname);
    initials = prxPosn(prxId1, 2, fullname);
    lastName = prxPosn(prxId1, 4, fullname);
    end;
else if prxMatch(prxId2, fullname) then do;
    firstName = prxPosn(prxId2, 2, fullname);
    initials = prxPosn(prxId2, 3, fullname);
    lastName = prxPosn(prxId2, 1, fullname);
    end;
drop prxId: ;
run;

proc print data=want noobs; run;
      first
      Name       lastName          initials    id    fullname

      Michael    Boyce               R.         1    Michael R. Boyce
      James      Brocksmith          G.         2    James G. Brocksmith, Jr.
      Gerald     Fitzgerald          F.         3    Gerald F. Fitzgerald, Jr.
      Norman     Bobins              R.         4    Norman R. Bobins, BS, MBA
      Peter      Pace                           5    Peter Pace, USMC, Ret.
      Norman     Ryan                A.         6    Ryan, Norman A.
      Anne       Newman Foreman                 7    Anne Newman Foreman
      Michael    Boyce               R.S.       8    Michael R.S. Boyce
      Norman     Ryan                A.         9    Ryan, Norman A., BS
PG

View solution in original post

6 REPLIES 6
kiranv_
Rhodochrosite | Level 12

I just got confused with variables you want in your output dataset

hkim3677
Calcite | Level 5

Sorry. The wanted dataset should look like..

Data test;
infile datalines DSD;
input ID First_name_with_middle $ Last_name $ First_name_without_middle $ ;
datalines;
1, Michael R., Boyce, Michael
2, James G., Brocksmith, James
3, Gerald F., Fitzgerald, Gerald
4, Norman R., Bobins, Norman
5, Peter, Pace, Peter
6, Norman A., Ryan, Norman
7, Anne Newman, Foreman, Anne
;

run;

 

But, when I ran the code above, First_name_with_middle column shows that some observations do not have a dot after middle name abbreviation. I do not know why... but I need the dot after the middle name abbreviation if there is a middle name abbreviation.

PGStats
Opal | Level 21

PRX to the rescue...

 

Data test;
input id fullname $64.;
datalines;
1 Michael R. Boyce
2 James G. Brocksmith, Jr.
3 Gerald F. Fitzgerald, Jr.
4 Norman R. Bobins, BS, MBA
5 Peter Pace, USMC, Ret.
6 Ryan, Norman A.
7 Anne Newman Foreman
8 Michael R.S. Boyce
9 Ryan, Norman A., BS
;

data want;
length firstName lastName $20 initials $6;
if not prxId1 then prxId1 + prxParse("/^(\w+)\s+(([A-Z]\.)*\s*)([^,]+)/");
if not prxId2 then prxId2 + prxParse("/^(\w+),\s*(\w+)\s+(([A-Z]\.)*)/");
set test;
if prxMatch(prxId1, fullname) then do;
    firstName = prxPosn(prxId1, 1, fullname);
    initials = prxPosn(prxId1, 2, fullname);
    lastName = prxPosn(prxId1, 4, fullname);
    end;
else if prxMatch(prxId2, fullname) then do;
    firstName = prxPosn(prxId2, 2, fullname);
    initials = prxPosn(prxId2, 3, fullname);
    lastName = prxPosn(prxId2, 1, fullname);
    end;
drop prxId: ;
run;

proc print data=want noobs; run;
      first
      Name       lastName          initials    id    fullname

      Michael    Boyce               R.         1    Michael R. Boyce
      James      Brocksmith          G.         2    James G. Brocksmith, Jr.
      Gerald     Fitzgerald          F.         3    Gerald F. Fitzgerald, Jr.
      Norman     Bobins              R.         4    Norman R. Bobins, BS, MBA
      Peter      Pace                           5    Peter Pace, USMC, Ret.
      Norman     Ryan                A.         6    Ryan, Norman A.
      Anne       Newman Foreman                 7    Anne Newman Foreman
      Michael    Boyce               R.S.       8    Michael R.S. Boyce
      Norman     Ryan                A.         9    Ryan, Norman A., BS
PG
hkim3677
Calcite | Level 5
Thank you very much!
hkim3677
Calcite | Level 5

Just one more quick question.

 

If I have the fullname.. like...

Lavizzo-Mourey, Evans

B. Evan Bayh, III

 

 

Those names are not correctly divided into firstname and lastname using the above code.

 

Thank you!

 

PGStats
Opal | Level 21

The possibilities are endless... You can account for hyphenated names and firstnames easily, but frankly I don't even know what's what in "B. Evan Bayh, III". If such "names" are rare, you might be better sending them to a separate table and treating them by hand...

 

Data test;
input id fullname $64.;
datalines;
1 Michael R. Boyce
2 James G. Brocksmith, Jr.
3 Gerald F. Fitzgerald, Jr.
4 Norman R. Bobins, BS, MBA
5 Peter Pace, USMC, Ret.
6 Ryan, Norman A.
7 Anne Newman Foreman
8 Michael R.S. Boyce
9 Ryan, Norman A., BS
10 Lavizzo-Mourey, Evans
11 B. Evan Bayh, III
;

data want except(keep=id fullname);
length firstName lastName $20 initials $6;
if not prxId1 then prxId1 + prxParse("/^([\w-]+)\s+(([A-Z]\.)*\s*)([^,]+)/");
if not prxId2 then prxId2 + prxParse("/^([\w-]+),\s*([\w-]+)\s+(([A-Z]\.)*)/");
set test;
if prxMatch(prxId1, fullname) then do;
    firstName = prxPosn(prxId1, 1, fullname);
    initials = prxPosn(prxId1, 2, fullname);
    lastName = prxPosn(prxId1, 4, fullname);
    output want;
    end;
else if prxMatch(prxId2, fullname) then do;
    firstName = prxPosn(prxId2, 2, fullname);
    initials = prxPosn(prxId2, 3, fullname);
    lastName = prxPosn(prxId2, 1, fullname);
    output want;
    end;
else output except;
drop prxId: ;
run;

title "Matched names";
proc print data=want noobs; run;

title "Unmatched names";
proc print data=except noobs; run;
PG

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 2765 views
  • 0 likes
  • 3 in conversation