Hi I want to clean up my data which contains full name.
For example,
1) John, Smith
2) Kevin, Space R.
3) John, Doe R.C.
4) Kevin, Space Joseph
Basically i want 2,3, and 4 to look like 1 and get rid of trailling letter or name (i.e. R., R.C., Joseph).
Thanks
Here's a statement that should do the trick:
new_name = catx(', ', scan(name, 1, ', '), scan(name, 2, ', '));
Notice that you can still run into trouble with two-word names before the comma. A mildly more complex approach could handle that.
Well, if we're going to expand the list of possibilities let's make sure we cover them all. I assume you want John, P. Richard to become John, Richard. But what should each of these become?
John, R.
John, R.C.
John, R. C.
If you can think of any more possibilities, this would be the time to mention them.
OK, here's one way:
new_name = scan(name, 1, ',');
part2 = scan(name, 2, ',');
if part2 > ' ' then do;
new_name = trim(new_name) || ',';
do i=1 to countw(part2, ' ');
next_piece = scan(part2, i, ' ');
if index(next_piece, '.') = 0 then new_name = trim(new_name) || ' ' || next_piece;
end;
end;
I'm not expecting any problems with it, but this is untested code so let me know if you have any difficulties with it.
data have;
input name $40.;
cards;
John, Smith
Kevin, Space R.
John, Doe R.C.
Kevin, Space Joseph
John,Doe
John,Doe D.
John,D. Doe
John,Doe-Johnson
;
run;
data want;
set have;
temp=prxchange('s/[a-z]\./ /i',-1,name);
new_name=catx(',',scan(temp,1,', '),scan(temp,2,', '));
drop temp;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.