How to remove Trailing Letters in Names

Reply
Occasional Contributor ysk
Occasional Contributor
Posts: 17

How to remove Trailing Letters in Names

[ Edited ]

Hi I want to clean up my data which contains full name.

 

For example,

 

1) John, Smith

2) Kevin, Space R.

3) John, Doe R.C.

4) Kevin, Space Joseph

 

Basically i want 2,3, and 4 to look like 1 and get rid of trailling letter or name (i.e. R., R.C., Joseph).

 

Thanks

 

Respected Advisor
Posts: 4,950

Re: How to remove Trailing Letters in Names

Here's a statement that should do the trick:

 

new_name = catx(', ', scan(name, 1, ', '), scan(name, 2, ', '));

 

Notice that you can still run into trouble with two-word names before the comma.  A mildly more complex approach could handle that.

Occasional Contributor ysk
Occasional Contributor
Posts: 17

Re: How to remove Trailing Letters in Names

Hi Astounding, thank you for your reply.

How about instances where names are like this.

1) John, P. Richard

how can i clean this up to align with the first example?

Thanks
Respected Advisor
Posts: 4,950

Re: How to remove Trailing Letters in Names

Well, if we're going to expand the list of possibilities let's make sure we cover them all.  I assume you want John, P. Richard to become John, Richard.  But what should each of these become?

 

John, R.

John, R.C.

John, R. C.

 

If you can think of any more possibilities, this would be the time to mention them.

Occasional Contributor ysk
Occasional Contributor
Posts: 17

Re: How to remove Trailing Letters in Names

Here are different variations (format) of names in the dataset

1. John,Doe -> John,Doe
2. John,Doe D. -> John,Doe
2. John,D. Doe -> John,Doe
3. John,Doe-Johnson -> John, Doe-Johnson


Respected Advisor
Posts: 4,950

Re: How to remove Trailing Letters in Names

OK, here's one way:

 

new_name = scan(name, 1, ',');

part2 = scan(name, 2, ',');

if part2 > ' ' then do;

   new_name = trim(new_name) || ',';

   do i=1 to countw(part2, ' ');

      next_piece = scan(part2, i, ' ');

      if index(next_piece, '.') = 0 then new_name = trim(new_name) || ' ' || next_piece;

   end;

end;

 

I'm not expecting any problems with it, but this is untested code so let me know if you have any difficulties with it.

Grand Advisor
Posts: 9,567

Re: How to remove Trailing Letters in Names

data have;
input name $40.;
cards;
John, Smith
Kevin, Space R.
John, Doe R.C.
Kevin, Space Joseph
John,Doe 
John,Doe D.
John,D. Doe 
John,Doe-Johnson 
;
run;
data want;
 set have;
 temp=prxchange('s/[a-z]\./ /i',-1,name);
 new_name=catx(',',scan(temp,1,', '),scan(temp,2,', '));
 drop temp;
run;
Ask a Question
Discussion stats
  • 6 replies
  • 376 views
  • 0 likes
  • 3 in conversation