Hi All,
I'm trying to remove middle initials at the end of a name IF it has a middle initial (some names don't have middle initial in the dataset).
What I have:
Name |
Doe, John M. |
Doe, Jane |
What I'd like:
Name |
Doe, John |
Doe, Jane |
I've seen some different approaches, and it seems as though a substr(scan()) may be the best approach, I'm just not sure how to apply it.
I tried the below code, but it removed everything after the first space.
Name = substr(Name,1,length(Name) - length(scan(Name,-1,' ')));
data have;
input Name $20.;
cards;
Doe, John M.
Doe, Jane
;
data want;
set have;
length want $20;
if countw(name,' ')=3 then want=substr(name,1,anyspace(strip(name),-99));
else want=name;
run;
data have;
input Name $20.;
cards;
Doe, John M.
Doe, Jane
;
data want;
set have;
length want $20;
if countw(name,' ')=3 then want=substr(name,1,anyspace(strip(name),-99));
else want=name;
run;
Or little cheeky-------------
data have;
input Name $20.;
cards;
Doe, John M.
Doe, Jane
;
data want;
set have;
length want $20;
want=catx(' ',scan(name,1,' '),scan(name,2,' '));
run;
Hi @BlayLay Both approaches follow the logic-
1. Check how many words are there in the name value.
2. If there are 3 words(assuming the 3rd one is he middle initial to be ignored), just extract the 1st 2. Otherwise just take the name value as is.
3. The 1st one uses SUBSTR to extract from position 1 up until the last delimiter ' ' that separates the words
4. The 2nd approach basically extracts 1st word and 2nd word and concatenates into 1. A lazy approach so to speak.
It's up to you and your convenience.
@BlayLay wrote:
@novinosrin
Appreciate the options, which one would you recommend from a performance/stability standpoint? They both seem to work, so wasn't sure if you had a preference based on your experience.
When it comes to names stored in a single variable I don't trust much of anything unless someone that is involved with creating the data promises, and will stand by their statement later, that "all the names are last, first with optional middle initial".
I have received data with names that in a single file were:
First name, last name, middle initial
First name, last name, other parent's last name, middle name
Last name, first name, other parent's last name, middle name
Last name, other parent's last name, first name
Last name -(hyphen character) other parent's last name, first name
Sprinkle in random Junior, or "the Second" after any of the first or last names
And one name like: Moon Beam Sun Child
If you have occasional other elements like titles, "Dr." "Esq." you may have more fun with getting the right pieces together depending on order and consistency.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.