I need to eliminate the middle initial off records that have a middle initial (not all have one). Data Sample:
Smith, Tim |
Howard, Matthew L. |
Jones, Sarah H. |
Charles, Andrew N. |
Doe, John |
In excel, I can use this formula: =IF(RIGHT(A2,1)=".",LEFT(A2,LEN(A2)-3),A2) to get my desired output:
Smith, Tim |
Howard, Matthew |
Jones, Sarah |
Charles, Andrew |
Doe, John |
How can I replicate this in SAS? I assume it is a combination of if and substring but have not been able to quickly work it out and thought someone might have a fast solution.
TIA!
data have;
input fullname $40.;
datalines;
Smith, Tim
Howard, Matthew L.
Jones, Sarah H.
Charles, Andrew N.
Doe, John
;;;;
data want;
set have;
fullname=catx(', ',scan(fullname,1), scan(fullname,2));
run;
data have;
input fullname $40.;
datalines;
Smith, Tim
Howard, Matthew L.
Jones, Sarah H.
Charles, Andrew N.
Doe, John
;;;;
data want;
set have;
fullname=catx(', ',scan(fullname,1), scan(fullname,2));
run;
Thank you for the speedy response. Much appreciated.
Unfortunately, if the last name has a hyphen, this causes the results to not be as I wanted. Any suggestions?
Try adding the comma modifier (',') to the scan functions like this:
data want;
set have;
fullname=catx(', ',scan(fullname,1,','), scan(fullname,2,','));
run;
The problem with all of these name clean-up rules is that they still have a wide variety of names to handle, and usually simple rules don't work 100% of the time. If a person's name is listed as
Van Arsdale, Charles
then none of the easily programmed rules will work.
data have;
input names $20.;
cards;
Smith, Tim
Howard, Matthew L.
Jones, Sarah H.
Charles, Andrew N.
Doe, John
;
data want;
set have;
call scan(names, -1, _n_, _iorc_);
if _n_ then substr(names,_n_)=' ';
run;
This one ends up removing the first name if there is no middle initial so it would only work if all my names had a middle initial.
I was able to combine some different ideas and came up with this:
data want;
set have;
last=substr(fullname,length(fullname));
format last :$1.;
if last = '.' then name=substr(fullname, 1, length(fullname)-3);
else name = fullname;
drop last fullname;
run;
Hi @ddavies My apologies. I shouldn't have used -1 to look behind from right. How about using 3 as the index value. I think this might nail it?
data want;
set have;
call scan(names, 3, _n_, _iorc_);
if _n_ then substr(names,_n_)=' ';
run;
Hi again @ddavies How about regular expressions
data have;
input names $20.;
cards;
Smith, Tim
Howard, Matthew L.
Jones, Sarah H.
Charles, Andrew N.
Doe, John
;
data want;
set have;
new_names=prxchange('s/(\s*\w+,)(\s+\w+)(\s*\w*.)/$1$2/', -1, names);
run;
names | new_names |
---|---|
Smith, Tim | Smith, Tim |
Howard, Matthew L. | Howard, Matthew |
Jones, Sarah H. | Jones, Sarah |
Charles, Andrew N. | Charles, Andrew |
Doe, John | Doe, John |
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.