DATA Step, Macro, Functions and more

Can I use the SCAN function to look for and output the longest text in a character string?

Accepted Solution Solved
Reply
Regular Contributor
Posts: 150
Accepted Solution

Can I use the SCAN function to look for and output the longest text in a character string?

I have a character variable with a string of first name and possibly middle initials or Jrs, Sr, etc. I want to scan the string and output the longest text in the string, assuming this will be the first name. I was using the following code:

 

data new;
    set old;
        firstname= scan(fname,1, ' ');   
        drop fname;
run;

 

to pull the first text before the space assuming it would be the first name, but some observations have middle initial before first name.


Accepted Solutions
Solution
‎03-20-2018 05:14 PM
Super User
Posts: 6,622

Re: Can I use the SCAN function to look for and output the longest text in a character string?

[ Edited ]

Just continue searching.  Assuming there are no more than 5 words in FNAME:

 

data want;

set have;

firstname = scan(fname, 1, ' ');

do k=2 to 5;

   test = scan(fname, k, ' ');

   if length(test) > length(firstname) then firstname=test;

end;

drop test k fname;

run;

View solution in original post


All Replies
Solution
‎03-20-2018 05:14 PM
Super User
Posts: 6,622

Re: Can I use the SCAN function to look for and output the longest text in a character string?

[ Edited ]

Just continue searching.  Assuming there are no more than 5 words in FNAME:

 

data want;

set have;

firstname = scan(fname, 1, ' ');

do k=2 to 5;

   test = scan(fname, k, ' ');

   if length(test) > length(firstname) then firstname=test;

end;

drop test k fname;

run;

Super User
Posts: 13,283

Re: Can I use the SCAN function to look for and output the longest text in a character string?

You may want to add a flag or possibly a note in the log if the longest "word" is something like III (for "the third") or any of your other suspect values like Junior or Senior spelled out in case you have values like "John Junior".

 

You may also want to look at your actual "word" count as some ethnicities may have a single name consisting or two or more "words". If the actual first name is something like "Bo Na Thone" or "Glory Moonshine" then keeping, our using Thone or Moonshine could be very incorrect.

Regular Contributor
Posts: 150

Re: Can I use the SCAN function to look for and output the longest text in a character string?

True, but if you have thousands of names, any idea on how you could efficiently be able to identify those?

Super User
Posts: 13,283

Re: Can I use the SCAN function to look for and output the longest text in a character string?


@Melk wrote:

True, but if you have thousands of names, any idea on how you could efficiently be able to identify those?


The only truly robust method I know of is to change the data collection/entry to enforce first name, last name, middle(other) name, and a field for Jr, Sr, Dr, Ms Mrs or what have you.

 

I have had to clean up such data where names were entered all in one field and identifying multiple word name segments was only one of the headaches. I did have thousands of names which took lots of hours to clean up. I noted the processing time/costs to the clients for such things.

 

I minor bit was searching the name string first for the words like II, III, IV, JR, 2nd, 3rd and moving them to another variable before the next processing. Note that II and JR (or ii) really wants to use something like FINDW or INDEXW because that can appear as part of names though unlikely as stand alone words. TRANWRD could be used to remove specific known words like this.

 

 

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 174 views
  • 3 likes
  • 3 in conversation