04-03-2015 08:23 AM
I've a data like this,
Dataset name: name_split
|LAWRENCE P POWERS, III|
|David R. Fox|
|ROBERT STEVEN SHAPIRO|
|JOHN E KIRK, IV|
|A JOHN KIRK|
I want data like this,
I've tried this code,
first_name=scan(fullname_new,1, ' ');
Middle_name=scan(fullname_new,2, ' ');
last_name=scan(fullname_new,3, ' ');
if length(Middle_name) gt 2 then
if last_name=Middle_name then
if length(suffix) gt 4 then
proc sort data=new_name;
proc sort data=new_name_suff;
update new_name new_name_suff;
if length(first_name) lt 2 then
Middle_name=scan(fullname_new,1, ' ');
First_name=scan(fullname_new,2, ' ');
last_name=scan(fullname_new,3, ' ');
update name fullname;
Although this code works with the data which I've, I'm worried if either middle name or suffix exceeds (in future data) greater than 2 or 4 characters. There should not be any special characters in derived fields as well.
I would appreciate for any help you offer me.
04-03-2015 09:49 AM
Your question is an unsolvable one.
- How can you know whether it is a secondname or a double word lastname
- think about this one CSV comma separated. Than de decimal parts are comma delimited. What is the meaning of a comma?
Unless you have some additional information you cannot segregate.
04-03-2015 10:04 AM
- How can you know whether it is a secondname or a double word lastname - letter with more than two characters are treated as last name.
- think about this one CSV comma separated. Than de decimal parts are comma delimited. What is the meaning of a comma? - commas are from flat file which we need to remove during\after split but before result.
Unless you have some additional information you cannot segregate. - any possibilities to optimize my code or generate a new code to produce the result as I mentioned?
04-03-2015 10:28 AM
There are options in the Input statement using pointer controls SAS(R) 9.4 Statements: Reference, Third Edition
or allow reading space (& : ~) SAS(R) 9.4 Statements: Reference, Third Edition
Using fcmp with own defined formats or the automatic _infile_ variable used with SAS functions
In case the basic string /input functions of SAS are not sufficient you can use http://www2.sas.com/proceedings/sugi29/265-29.pdf
More possibilities to optimize and change as I would like to try.
If you are learning than it is getting aware of those. If you are delivering a project it is applying those with coding at cost/profit working hours. At what one are you in, I was guessing learning.
04-03-2015 10:34 AM
pid=prxparse('/(\w+) (\w+\.?) (\w+)\,?(\D+)/');
input fullname $50.;
if prxmatch('/^\S\b/',fullname)>0 then fullname=prxchange('s/^(\S\b)(\s\w+)(\s.*)/$2 $1$3/',-1,fullname);
if prxmatch(pid,fullname) then
LAWRENCE P POWERS, III
David R. Fox
ROBERT STEVEN SHAPIRO
JOHN E KIRK, IV
A JOHN KIRK
04-03-2015 10:57 AM
Something else to consider besides last names with spaces are hyphenated names such as Jonh J Cartwright-Pickering.
Also assuming 2 characters is a middle name fails with many nationalities.
How many records do you have to process and will the source continue to NOT have moved into 1990's level database structures of having separate fields?
If you're going to have this going on into the future you really want to bring issues like this to whoever is collecting the data that there may be much more cost and likely errors in trying to post-process names than the cost to collect them in a better format.
04-03-2015 11:07 AM
data have; input Fullname_new $40.; cards; LAWRENCE P POWERS, III David R. Fox ROBERT STEVEN SHAPIRO JOHN E KIRK, IV A JOHN KIRK ; run; data want; set have; length Firstname Secondname Lastname Suffix temp $ 40; Firstname=scan(Fullname_new,1,,'ps'); Secondname=scan(Fullname_new,2,,'ps'); Lastname=scan(Fullname_new,3,,'ps'); Suffix=scan(Fullname_new,4,,'ps'); if length(Firstname)=1 then do;temp=Firstname;Firstname=Secondname;Secondname=temp;end; drop temp; run;