I need to eliminate the middle initial off records that have a middle initial (not all have one). Data Sample:
Smith, Tim |
Howard, Matthew L. |
Jones, Sarah H. |
Charles, Andrew N. |
Doe, John |
In excel, I can use this formula: =IF(RIGHT(A2,1)=".",LEFT(A2,LEN(A2)-3),A2) to get my desired output:
Smith, Tim |
Howard, Matthew |
Jones, Sarah |
Charles, Andrew |
Doe, John |
How can I replicate this in SAS? I assume it is a combination of if and substring but have not been able to quickly work it out and thought someone might have a fast solution.
TIA!
data have;
input fullname $40.;
datalines;
Smith, Tim
Howard, Matthew L.
Jones, Sarah H.
Charles, Andrew N.
Doe, John
;;;;
data want;
set have;
fullname=catx(', ',scan(fullname,1), scan(fullname,2));
run;
data have;
input fullname $40.;
datalines;
Smith, Tim
Howard, Matthew L.
Jones, Sarah H.
Charles, Andrew N.
Doe, John
;;;;
data want;
set have;
fullname=catx(', ',scan(fullname,1), scan(fullname,2));
run;
Thank you for the speedy response. Much appreciated.
Unfortunately, if the last name has a hyphen, this causes the results to not be as I wanted. Any suggestions?
Try adding the comma modifier (',') to the scan functions like this:
data want;
set have;
fullname=catx(', ',scan(fullname,1,','), scan(fullname,2,','));
run;
The problem with all of these name clean-up rules is that they still have a wide variety of names to handle, and usually simple rules don't work 100% of the time. If a person's name is listed as
Van Arsdale, Charles
then none of the easily programmed rules will work.
data have;
input names $20.;
cards;
Smith, Tim
Howard, Matthew L.
Jones, Sarah H.
Charles, Andrew N.
Doe, John
;
data want;
set have;
call scan(names, -1, _n_, _iorc_);
if _n_ then substr(names,_n_)=' ';
run;
This one ends up removing the first name if there is no middle initial so it would only work if all my names had a middle initial.
I was able to combine some different ideas and came up with this:
data want;
set have;
last=substr(fullname,length(fullname));
format last :$1.;
if last = '.' then name=substr(fullname, 1, length(fullname)-3);
else name = fullname;
drop last fullname;
run;
Hi @ddavies My apologies. I shouldn't have used -1 to look behind from right. How about using 3 as the index value. I think this might nail it?
data want;
set have;
call scan(names, 3, _n_, _iorc_);
if _n_ then substr(names,_n_)=' ';
run;
Hi again @ddavies How about regular expressions
data have;
input names $20.;
cards;
Smith, Tim
Howard, Matthew L.
Jones, Sarah H.
Charles, Andrew N.
Doe, John
;
data want;
set have;
new_names=prxchange('s/(\s*\w+,)(\s+\w+)(\s*\w*.)/$1$2/', -1, names);
run;
names | new_names |
---|---|
Smith, Tim | Smith, Tim |
Howard, Matthew L. | Howard, Matthew |
Jones, Sarah H. | Jones, Sarah |
Charles, Andrew N. | Charles, Andrew |
Doe, John | Doe, John |
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.