Solved: Parse character string with multiple delimiters

SasPerson85 · Posted 09-11-2019 05:17 PM

I have a single observation dataset that contains peoples first and last names. There are multiple delimiters involved as it comes from a free form text field.

Example of a string:

var

Gabriella Cornish Kenny Rooney, Rupert Pickett,Brett Mcnally; Ishmael Khan Kaitlan Wallace, Elyse Marriott, Dollie Parry

In this string there are space, comma, and semicolon delimiters.

I need the output dataset to look like this:

var

Gabriella Cornish

Kenny Rooney

Rupert Pickett

Brett Mcnally

Ishmael Khan

Kaitlan Wallace

Elyse Marriott

Dollie Parry

So basically I need someway (I don't care which why; macro, datastep, sql etc.) that is very dynamic and can handle a variety of delimiters as well as multiple lengths.

Thanks in advance

Tom · Posted 09-12-2019 09:16 AM

@SasPerson85 wrote:
Yup exactly two words to match in each obs.

Then take two words out of the string at a time.

data want ;
  set have ;
  length name $200;
  do i=1 by 2 until (name=' ');
     name=catx(' ',scan(var,i,' ,;'),scan(var,i+1,' ,;'));
     if i=1 or name ne ' ' then output;
  end;
  drop var;
run;

proc print;
run;

Obs          name            i

 1     Gabriella Cornish     1
 2     Kenny Rooney          3
 3     Rupert Pickett        5
 4     Brett Mcnally         7
 5     Ishmael Khan          9
 6     Kaitlan Wallace      11
 7     Elyse Marriott       13
 8     Dollie Parry         15

View solution in original post

Jagadishkatam · Posted 09-11-2019 05:41 PM

Please try below considering that the names are a combination of two words (first & last name or last or first name)

data have;
input var&$200.;
datalines4;
Gabriella Cornish Kenny Rooney, Rupert Pickett,Brett Mcnally; Ishmael Khan Kaitlan Wallace, Elyse Marriott, Dollie Parry
;;;;

data want;
set have;
retain var2;
do i = 1 to countw(var,' ,;')-1;
var2=catx(' ',scan(var,i,' ,;'),scan(var,i+1,' ,;'));
output;
end;
run;

Thanks,
Jag

ballardw · Posted 09-11-2019 06:58 PM

Or sometimes three or more words:

Paul Le Blanc

Eric von Lustbader

Don't forget that some people include things like: Junior, Senior, II, III, IV, and some times Doctor, Esquire, and other fussiness.

I feel your pain. I would tend to split the data into records at the comma and semicolon, unless you have some entries that are using Lastname, Firstname before tackling the space delimited bit.

Something like:

data temp;
  set have;
  length name $ 50.;
  do i= 1 to (countw(var,',;'));
   name= strip(scan(var,i,',;'));
   problem = (countw(name) ne 2);
   output;
  end;
  drop i var;
run;

The ones where problem = 1 may require additional work. Some may yield to @Jagadishkatam's bit of pulling things out 2 at a time. But where to deal with 3 or 5 or more name elements may require manual processing.

Ksharp · Posted 09-12-2019 07:30 AM

Is that exactly two words to match in an obs?

data have;
infile datalines dlm=' ,;';
input (first last) (: $40.) @@;
var=catx(' ',first,last);
datalines4;
Gabriella Cornish Kenny Rooney, Rupert Pickett,Brett Mcnally; Ishmael Khan Kaitlan Wallace, Elyse Marriott, Dollie Parry
;;;;

SasPerson85 · Posted 09-12-2019 09:10 AM

Yup exactly two words to match in each obs.

Tom · Posted 09-12-2019 09:16 AM

@SasPerson85 wrote:
Yup exactly two words to match in each obs.

Then take two words out of the string at a time.

data want ;
  set have ;
  length name $200;
  do i=1 by 2 until (name=' ');
     name=catx(' ',scan(var,i,' ,;'),scan(var,i+1,' ,;'));
     if i=1 or name ne ' ' then output;
  end;
  drop var;
run;

proc print;
run;

Obs          name            i

 1     Gabriella Cornish     1
 2     Kenny Rooney          3
 3     Rupert Pickett        5
 4     Brett Mcnally         7
 5     Ishmael Khan          9
 6     Kaitlan Wallace      11
 7     Elyse Marriott       13
 8     Dollie Parry         15

Parse character string with multiple delimiters

Re: Parse character string with multiple delimiters

Re: Parse character string with multiple delimiters

Re: Parse character string with multiple delimiters

Re: Parse character string with multiple delimiters

Re: Parse character string with multiple delimiters

Re: Parse character string with multiple delimiters

Parse character string with multiple delimiters

Re: Parse character string with multiple delimiters

Re: Parse character string with multiple delimiters

Re: Parse character string with multiple delimiters

Re: Parse character string with multiple delimiters

Re: Parse character string with multiple delimiters

Re: Parse character string with multiple delimiters

Registration is open

SAS Training: Just a Click Away