I have the following dataset from which I need help to extract middle name from heterogeneous variable with different words count.
I saw prior link but it does not work for me.
my data has only 1 variable
Data statisticians;
infile datalines ;
Input name $30. ;
Datalines;
Ronaldo Al Fisher
H. O. Meir
Lee Sara Kim Ivan
Marco Sina
;
run;
Names with first and last names only should be blank in the new column called middle.
I tried the following code but it does not work:
data statisticians; length middle $10; set statisticians; if
count = 2 then middle=.; if count = 3 then middle= scan(name,2);
if count = 4 then middle=scan(name,2); run;Any help will be greatly appreciated.
data statisticians;
infile datalines;
input name $30.;
length middle $10;
if countw(name) > 2 then middle = scan(name,2);
datalines;
Ronaldo Al Fisher
H. O. Meir
Lee Sara Kim Ivan
Marco Sina
;
data statisticians;
infile datalines;
input name $30.;
length middle $10;
if countw(name) > 2 then middle = scan(name,2);
datalines;
Ronaldo Al Fisher
H. O. Meir
Lee Sara Kim Ivan
Marco Sina
;
I don't think there is a rule to define which string in a four string name is the real middle name.
If you can suggest such a rule then you can adapt the code you got.
@mrahouma wrote:
I have the following dataset from which I need help to extract middle name from heterogeneous variable with different words count.
I saw prior link but it does not work for me.
my data has only 1 variable
Data statisticians;
infile datalines ;
Input name $30. ;
Datalines;
Ronaldo Al Fisher
H. O. Meir
Lee Sara Kim Ivan
Marco Sina
;
run;
Names with first and last names only should be blank in the new column called middle.
I tried the following code but it does not work:
data statisticians; length middle $10; set statisticians; if count = 2 then middle=.; if count = 3 then middle= scan(name,2); if count = 4 then middle=scan(name,2); run;Any help will be greatly appreciated.
Are you 100 percent sure that your data does not have an last names that are two or more words such as "Van Dyke" "De La Cruz" without middle names present?
Or possibly have only a last name (or first)?
Or have added bits like titles (Dr John Doe, Mrs Jane Roe) or indications like "John Smith II" or "John Smith Junior" or "John Smith the Third"?
You may want to bring anything with a count > 5 to personal attention.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.