I have a variable that lists a person's entire name. I would like to extract the first, middle, and last names as well names suffixes (e.g., Jr.) and save them into separate variables. I have figured out how to do everything except pull the full middle name if the middle name is more than one word. An example that is giving me trouble:
Johnson, Ethan Marc Anthony, Jr.
Where Ethan is the first name, Marc Anthony is the middle name, Johnson is the last name, and Jr. is the name suffix.
When I use the following, everything is pulled correctly except middle name:
lastnm = scan(Name,1,","); firstmid = scan(Name,2,","); firstnm = scan(firstmid ,1," "); midnm= scan(firstmid ,2," "); nmsuffix = scan(Name,3,",");
When I run this, the result is:
firstnm = Ethan
midnm = Marc (should be "Marc Anthony")
lastnm = Johnson
nmsuffix = Jr.
Thoughts?
You can use substr function. Just replace midnm with following statement:
midnm= substr(firstmid,length(firstnm)+2,length(firstmid)-length(firstnm));
@raivester wrote:
I have a variable that lists a person's entire name. I would like to extract the first, middle, and last names as well names suffixes (e.g., Jr.) and save them into separate variables. I have figured out how to do everything except pull the full middle name if the middle name is more than one word. An example that is giving me trouble:
Johnson, Ethan Marc Anthony, Jr.
Where Ethan is the first name, Marc Anthony is the middle name, Johnson is the last name, and Jr. is the name suffix.
When I use the following, everything is pulled correctly except middle name:
lastnm = scan(Name,1,","); firstmid = scan(Name,2,","); firstnm = scan(firstmid ,1," "); midnm= scan(firstmid ,2," "); nmsuffix = scan(Name,3,",");When I run this, the result is:
firstnm = Ethan
midnm = Marc (should be "Marc Anthony")
lastnm = Johnson
nmsuffix = Jr.
Thoughts?
Try something like this:
data example; name="Johnson, Ethan Marc Anthony, Jr."; length lastnm $ 10 firstmid $ 30 firstnm midnm $15 ; lastnm = scan(Name,1,","); firstmid = strip(scan(Name,2,",")); firstnm = scan(firstmid ,1," "); midnm= strip(substr(firstmid ,length(firstnm)+2)); nmsuffix = strip(scan(Name,3,",")); run;
You didn't show an entire data step so I don't know if you set lengths for your variables. It is a good idea to do so.
The Strip is because the way you build Firstmid it will often have a leading blank, as in your example.
The midnm uses the length function to find the length of the first name then start pulling characters at two after that length to start on the remainder. I include another strip there because, having dealt some name fields like this, there will be garbage characters in there.
You can use substr function. Just replace midnm with following statement:
midnm= substr(firstmid,length(firstnm)+2,length(firstmid)-length(firstnm));
Using SUBSTR() instead of SCAN().
data example; name="Johnson, Ethan Marc Anthony, Jr."; lastnm = scan(Name,1,","); firstmid = scan(Name,2,","); firstnm = scan(firstmid ,1," "); midnm= substr(firstmid ,findc(left(firstmid),' ')+1); nmsuffix = scan(Name,3,","); run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.