Solved: Re: Can I use the SCAN function to look for and output the longest tex...

Melk · Posted 03-20-2018 05:01 PM

I have a character variable with a string of first name and possibly middle initials or Jrs, Sr, etc. I want to scan the string and output the longest text in the string, assuming this will be the first name. I was using the following code:

data new;
   set old;
        firstname= scan(fname,1, ' ');
       drop fname;
run;

to pull the first text before the space assuming it would be the first name, but some observations have middle initial before first name.

Astounding · Posted 03-20-2018 05:07 PM

Just continue searching. Assuming there are no more than 5 words in FNAME:

data want;

set have;

firstname = scan(fname, 1, ' ');

do k=2 to 5;

test = scan(fname, k, ' ');

if length(test) > length(firstname) then firstname=test;

end;

drop test k fname;

run;

View solution in original post

Astounding · Posted 03-20-2018 05:07 PM

Just continue searching. Assuming there are no more than 5 words in FNAME:

data want;

set have;

firstname = scan(fname, 1, ' ');

do k=2 to 5;

test = scan(fname, k, ' ');

if length(test) > length(firstname) then firstname=test;

end;

drop test k fname;

run;

ballardw · Posted 03-20-2018 05:46 PM

You may want to add a flag or possibly a note in the log if the longest "word" is something like III (for "the third") or any of your other suspect values like Junior or Senior spelled out in case you have values like "John Junior".

You may also want to look at your actual "word" count as some ethnicities may have a single name consisting or two or more "words". If the actual first name is something like "Bo Na Thone" or "Glory Moonshine" then keeping, our using Thone or Moonshine could be very incorrect.

Melk · Posted 03-20-2018 11:44 PM

True, but if you have thousands of names, any idea on how you could efficiently be able to identify those?

ballardw · Posted 03-21-2018 10:59 AM

@Melk wrote:

True, but if you have thousands of names, any idea on how you could efficiently be able to identify those?

The only truly robust method I know of is to change the data collection/entry to enforce first name, last name, middle(other) name, and a field for Jr, Sr, Dr, Ms Mrs or what have you.

I have had to clean up such data where names were entered all in one field and identifying multiple word name segments was only one of the headaches. I did have thousands of names which took lots of hours to clean up. I noted the processing time/costs to the clients for such things.

I minor bit was searching the name string first for the words like II, III, IV, JR, 2nd, 3rd and moving them to another variable before the next processing. Note that II and JR (or ii) really wants to use something like FINDW or INDEXW because that can appear as part of names though unlikely as stand alone words. TRANWRD could be used to remove specific known words like this.

Can I use the SCAN function to look for and output the longest text in a character string?

Re: Can I use the SCAN function to look for and output the longest text in a character string?

Re: Can I use the SCAN function to look for and output the longest text in a character string?

Re: Can I use the SCAN function to look for and output the longest text in a character string?

Re: Can I use the SCAN function to look for and output the longest text in a character string?

Re: Can I use the SCAN function to look for and output the longest text in a character string?

Can I use the SCAN function to look for and output the longest text in a character string?

Re: Can I use the SCAN function to look for and output the longest text in a character string?

Re: Can I use the SCAN function to look for and output the longest text in a character string?

Re: Can I use the SCAN function to look for and output the longest text in a character string?

Re: Can I use the SCAN function to look for and output the longest text in a character string?

Re: Can I use the SCAN function to look for and output the longest text in a character string?

Registration is open

SAS Training: Just a Click Away