Solved: Re: Scanning for multiple spaces

camfarrell25 · Posted 08-22-2018 11:40 AM

Hello,

I am trying to use the scan function to distinguish between first and last name, which are separated by a longer space. For example:

HOGUE STAUBIN BENOIT G

The following code

BOR1_LASTNAME = trim(scan(BOR1_FULL_NAME_COR,1," "));

Breaks it down by:

HOGUE

When the result I would want is: HOGUE STAUBIN

Note that there's some instances where the last name is composed (two names) and sometimes single, but the first and last name(s) are always separated by a longer space.

Any insights?

error_prone · Posted 08-22-2018 03:10 PM

Can you post more data as text? I won't open ms office files.

The following regular expression can help solving the problem:

data have;
    text1 = "HOGUE STAUBIN      BENOIT G";
    rx = prxparse('/(\w+ \w*) +.*/');
    
    if prxmatch(rx, trim(text1)) then do;
    	last_name = prxposn(rx, 1, text1);
    end;
run;

View solution in original post

Reeza · Posted 08-22-2018 11:43 AM

Look into the modifiers (m) for the SCAN() function.

https://documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.4&docsetId=lefunctionsref&docsetTarg...

@camfarrell25 wrote:

Hello,

I am trying to use the scan function to distinguish between first and last name, which are separated by a longer space. For example:

HOGUE STAUBIN BENOIT G

The following code

BOR1_LASTNAME = trim(scan(BOR1_FULL_NAME_COR,1," "));

Breaks it down by:

HOGUE

When the result I would want is: HOGUE STAUBIN

Note that there's some instances where the last name is composed (two names) and sometimes single, but the first and last name(s) are always separated by a longer space.

Any insights?

Reeza · Posted 08-22-2018 11:48 AM

Any chance your delimiter is actually a tab? Can you use '09'x to see if it's a tab or verify in a text editor?

camfarrell25 · Posted 08-22-2018 11:49 AM

I tried the following and no luck..

BOR1_LASTNAME = trim(scan(BOR1_FULL_NAME_COR,1,'09'x));

Reeza · Posted 08-22-2018 11:50 AM

Is this from a text file? Can you upload a small sample? Pasting as text may remove issues in the actual data unfortunately.

camfarrell25 · Posted 08-22-2018 11:54 AM

It's not pulled from a text file - see attached. Essentially the logic that would work is to be able to separate the first and last name in two if the space between them exceeds one, and then further breaking it down into first last name, and second last name based on the presence of a single space.

Is there a way?

Reeza · Posted 08-22-2018 12:24 PM

@camfarrell25 wrote:

It's not pulled from a text file - see attached. Essentially the logic that would work is to be able to separate the first and last name in two if the space between them exceeds one, and then further breaking it down into first last name, and second last name based on the presence of a single space.

Is there a way?

I'm assuming you're actually asking how, rather than if it's possible, otherwise the answer is yes.

Use FINDW to find a string of two spaces, then use SUBSTR at that location.

I also used STRIP to remove any leading spaces from the second portion.

data have;
    text1 = "HOGUE STAUBIN      BENOIT G";
    x=findw(text1, "  ");
    p1 = substr(text1, 1, x);
    p2 = strip(substr(text1, x+1));
run;

error_prone · Posted 08-22-2018 03:10 PM

Can you post more data as text? I won't open ms office files.

The following regular expression can help solving the problem:

data have;
    text1 = "HOGUE STAUBIN      BENOIT G";
    rx = prxparse('/(\w+ \w*) +.*/');
    
    if prxmatch(rx, trim(text1)) then do;
    	last_name = prxposn(rx, 1, text1);
    end;
run;

camfarrell25 · Posted 08-22-2018 03:37 PM

Hello,

I'm sorry - find attach text file.

CF

Registration is open

SAS Training: Just a Click Away