02-29-2016 11:45 AM - edited 02-29-2016 11:52 AM
Hi I want to clean up my data which contains full name.
1) John, Smith
2) Kevin, Space R.
3) John, Doe R.C.
4) Kevin, Space Joseph
Basically i want 2,3, and 4 to look like 1 and get rid of trailling letter or name (i.e. R., R.C., Joseph).
02-29-2016 12:17 PM
Here's a statement that should do the trick:
new_name = catx(', ', scan(name, 1, ', '), scan(name, 2, ', '));
Notice that you can still run into trouble with two-word names before the comma. A mildly more complex approach could handle that.
02-29-2016 01:14 PM
02-29-2016 01:27 PM
Well, if we're going to expand the list of possibilities let's make sure we cover them all. I assume you want John, P. Richard to become John, Richard. But what should each of these become?
John, R. C.
If you can think of any more possibilities, this would be the time to mention them.
02-29-2016 01:36 PM
02-29-2016 02:00 PM
OK, here's one way:
new_name = scan(name, 1, ',');
part2 = scan(name, 2, ',');
if part2 > ' ' then do;
new_name = trim(new_name) || ',';
do i=1 to countw(part2, ' ');
next_piece = scan(part2, i, ' ');
if index(next_piece, '.') = 0 then new_name = trim(new_name) || ' ' || next_piece;
I'm not expecting any problems with it, but this is untested code so let me know if you have any difficulties with it.
02-29-2016 08:44 PM
data have; input name $40.; cards; John, Smith Kevin, Space R. John, Doe R.C. Kevin, Space Joseph John,Doe John,Doe D. John,D. Doe John,Doe-Johnson ; run; data want; set have; temp=prxchange('s/[a-z]\./ /i',-1,name); new_name=catx(',',scan(temp,1,', '),scan(temp,2,', ')); drop temp; run;