BookmarkSubscribeRSS Feed
Babloo
Rhodochrosite | Level 12

I've a data like this,

Dataset name: name_split

Fullname_new
LAWRENCE P POWERS, III
David R. Fox
ROBERT STEVEN SHAPIRO
JOHN E KIRK, IV
A JOHN KIRK

I want data like this,

FirstnameSecondnameLastnameSuffix
LAWRENCEPPOWERSIII
DavidRFox
ROBERTSTEVENSHAPIRO
JOHN EKIRKIV
JOHN AKIRK

I've tried this code,

data new_name;

        set name_split;

               

        first_name=scan(fullname_new,1, ' ');

        Middle_name=scan(fullname_new,2, ' ');

        last_name=scan(fullname_new,3, ' ');

                if length(Middle_name) gt 2 then

                last_name=Middle_name;

        if last_name=Middle_name then

                Middle_name=' ';

run;

data new_name_suff;

        set name_split;

        suffix=scan(fullname_new,-1,',');

        if length(suffix) gt 4 then

                delete;

run;

proc sort data=new_name;

        by fullname_new;

run;

proc sort data=new_name_suff;

        by fullname_new;

run;

data name;

        update new_name new_name_suff;

        by fullname_new;

run;

data fullname;

set name;

if length(first_name) lt 2 then

Middle_name=scan(fullname_new,1, ' ');

First_name=scan(fullname_new,2, ' ');

last_name=scan(fullname_new,3, ' ');

run;


data fullname1;

        update name fullname;

        by fullname_new;

run;

Although this code works with the data which I've, I'm worried if either middle name or suffix exceeds (in future data) greater than 2 or 4 characters. There should not be any special characters in derived fields as well.

I would appreciate for any help you offer me.

   

  

7 REPLIES 7
jakarman
Barite | Level 11

Your question is an unsolvable one.

- How can you know whether it is a secondname or a double word lastname

- think about this one CSV comma separated. Than de decimal parts are comma delimited.  What is the meaning of a comma?

Unless you have some additional information you cannot segregate.

---->-- ja karman --<-----
Babloo
Rhodochrosite | Level 12

- How can you know whether it is a secondname or a double word lastname - letter with more than two characters are treated as last name.

- think about this one CSV comma separated. Than de decimal parts are comma delimited.  What is the meaning of a comma? - commas are from flat file which we need to remove during\after split but before result.

Unless you have some additional information you cannot segregate. - any possibilities to optimize my code or generate a new code to produce the result as I mentioned?

jakarman
Barite | Level 11

There are options in the Input statement using pointer controls SAS(R) 9.4 Statements: Reference, Third Edition

or allow reading space (& : ~) SAS(R) 9.4 Statements: Reference, Third Edition

Using fcmp with own defined formats or the automatic _infile_ variable used with SAS functions

In case the basic string /input functions of SAS are not sufficient you can use http://www2.sas.com/proceedings/sugi29/265-29.pdf

More possibilities to optimize and change as I would like to try.

If you are learning than it is getting aware of those. If you are delivering a project it is applying those with coding at cost/profit working hours. At what one are you in, I was guessing learning.

---->-- ja karman --<-----
slchen
Lapis Lazuli | Level 10

data have;

pid=prxparse('/(\w+) (\w+\.?) (\w+)\,?(\D+)/');

input fullname $50.;

if prxmatch('/^\S\b/',fullname)>0 then fullname=prxchange('s/^(\S\b)(\s\w+)(\s.*)/$2 $1$3/',-1,fullname);

if prxmatch(pid,fullname) then

   firstname=prxposn(pid,1,fullname);

   middlename=prxposn(pid,2,fullname);

   lastname=prxposn(pid,3,fullname);

   suffix=prxposn(pid,4,fullname);

cards;

LAWRENCE P POWERS, III

David R. Fox

ROBERT STEVEN SHAPIRO

JOHN E KIRK, IV

A JOHN KIRK

;

proc print;

run;

Babloo
Rhodochrosite | Level 12

Fantastic!

May I request you to tell me about \w+ in prxparse and \S\b and /$2 $1$3/ in prxmatch?

ballardw
Super User

Something else to consider besides last names with spaces are hyphenated names such as Jonh J Cartwright-Pickering.

Also assuming 2 characters is a middle name fails with many nationalities.

How many records do you have to process and will the source continue to NOT have moved into 1990's level database structures of having separate fields?

If you're going to have this going on into the future you really want to bring issues like this to whoever is collecting the data that there may be much more cost and likely errors in trying to post-process names than the cost to collect them in a better format.

Ksharp
Super User
data have;
input Fullname_new $40.;
cards;
LAWRENCE P POWERS, III
David R. Fox
ROBERT STEVEN SHAPIRO
JOHN E KIRK, IV
A JOHN KIRK
;
run;
data want;
 set have;
 length Firstname     Secondname     Lastname     Suffix temp $ 40;
  Firstname=scan(Fullname_new,1,,'ps');
  Secondname=scan(Fullname_new,2,,'ps');
  Lastname=scan(Fullname_new,3,,'ps');
  Suffix=scan(Fullname_new,4,,'ps');
 if length(Firstname)=1 then do;temp=Firstname;Firstname=Secondname;Secondname=temp;end;
 drop temp;
run;


Xia Keshan

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 2108 views
  • 1 like
  • 5 in conversation