I have the following dataset from which I need help to extract middle name from heterogeneous variable with different words count.
I saw prior link but it does not work for me.
my data has only 1 variable
Data statisticians;
infile datalines ;
Input name $30. ;
Datalines;
Ronaldo Al Fisher
H. O. Meir
Lee Sara Kim Ivan
Marco Sina
;
run;
Names with first and last names only should be blank in the new column called middle.
I tried the following code but it does not work:
data statisticians; length middle $10; set statisticians; if
count = 2 then middle=.; if count = 3 then middle= scan(name,2);
if count = 4 then middle=scan(name,2); run;
Any help will be greatly appreciated.
data statisticians;
infile datalines;
input name $30.;
length middle $10;
if countw(name) > 2 then middle = scan(name,2);
datalines;
Ronaldo Al Fisher
H. O. Meir
Lee Sara Kim Ivan
Marco Sina
;
data statisticians;
infile datalines;
input name $30.;
length middle $10;
if countw(name) > 2 then middle = scan(name,2);
datalines;
Ronaldo Al Fisher
H. O. Meir
Lee Sara Kim Ivan
Marco Sina
;
I don't think there is a rule to define which string in a four string name is the real middle name.
If you can suggest such a rule then you can adapt the code you got.
@mrahouma wrote:
I have the following dataset from which I need help to extract middle name from heterogeneous variable with different words count.
I saw prior link but it does not work for me.
my data has only 1 variable
Data statisticians;
infile datalines ;
Input name $30. ;
Datalines;
Ronaldo Al Fisher
H. O. Meir
Lee Sara Kim Ivan
Marco Sina
;
run;
Names with first and last names only should be blank in the new column called middle.
I tried the following code but it does not work:
data statisticians; length middle $10; set statisticians; if count = 2 then middle=.; if count = 3 then middle= scan(name,2); if count = 4 then middle=scan(name,2); run;
Any help will be greatly appreciated.
Are you 100 percent sure that your data does not have an last names that are two or more words such as "Van Dyke" "De La Cruz" without middle names present?
Or possibly have only a last name (or first)?
Or have added bits like titles (Dr John Doe, Mrs Jane Roe) or indications like "John Smith II" or "John Smith Junior" or "John Smith the Third"?
You may want to bring anything with a count > 5 to personal attention.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.