I need to eliminate the middle initial off records that have a middle initial (not all have one). Data Sample:
| Smith, Tim |
| Howard, Matthew L. |
| Jones, Sarah H. |
| Charles, Andrew N. |
| Doe, John |
In excel, I can use this formula: =IF(RIGHT(A2,1)=".",LEFT(A2,LEN(A2)-3),A2) to get my desired output:
| Smith, Tim |
| Howard, Matthew |
| Jones, Sarah |
| Charles, Andrew |
| Doe, John |
How can I replicate this in SAS? I assume it is a combination of if and substring but have not been able to quickly work it out and thought someone might have a fast solution.
TIA!
data have;
input fullname $40.;
datalines;
Smith, Tim
Howard, Matthew L.
Jones, Sarah H.
Charles, Andrew N.
Doe, John
;;;;
data want;
set have;
fullname=catx(', ',scan(fullname,1), scan(fullname,2));
run;
data have;
input fullname $40.;
datalines;
Smith, Tim
Howard, Matthew L.
Jones, Sarah H.
Charles, Andrew N.
Doe, John
;;;;
data want;
set have;
fullname=catx(', ',scan(fullname,1), scan(fullname,2));
run;
Thank you for the speedy response. Much appreciated.
Unfortunately, if the last name has a hyphen, this causes the results to not be as I wanted. Any suggestions?
Try adding the comma modifier (',') to the scan functions like this:
data want;
set have;
fullname=catx(', ',scan(fullname,1,','), scan(fullname,2,','));
run;
The problem with all of these name clean-up rules is that they still have a wide variety of names to handle, and usually simple rules don't work 100% of the time. If a person's name is listed as
Van Arsdale, Charles
then none of the easily programmed rules will work.
data have;
input names $20.;
cards;
Smith, Tim
Howard, Matthew L.
Jones, Sarah H.
Charles, Andrew N.
Doe, John
;
data want;
set have;
call scan(names, -1, _n_, _iorc_);
if _n_ then substr(names,_n_)=' ';
run;
This one ends up removing the first name if there is no middle initial so it would only work if all my names had a middle initial.
I was able to combine some different ideas and came up with this:
data want;
set have;
last=substr(fullname,length(fullname));
format last :$1.;
if last = '.' then name=substr(fullname, 1, length(fullname)-3);
else name = fullname;
drop last fullname;
run;
Hi @ddavies My apologies. I shouldn't have used -1 to look behind from right. How about using 3 as the index value. I think this might nail it?
data want;
set have;
call scan(names, 3, _n_, _iorc_);
if _n_ then substr(names,_n_)=' ';
run;
Hi again @ddavies How about regular expressions
data have;
input names $20.;
cards;
Smith, Tim
Howard, Matthew L.
Jones, Sarah H.
Charles, Andrew N.
Doe, John
;
data want;
set have;
new_names=prxchange('s/(\s*\w+,)(\s+\w+)(\s*\w*.)/$1$2/', -1, names);
run;
| names | new_names |
|---|---|
| Smith, Tim | Smith, Tim |
| Howard, Matthew L. | Howard, Matthew |
| Jones, Sarah H. | Jones, Sarah |
| Charles, Andrew N. | Charles, Andrew |
| Doe, John | Doe, John |
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.