BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SasPerson85
Calcite | Level 5

I have a single observation dataset that contains peoples first and last names. There are multiple delimiters involved as it comes from a free form text field.

 

Example of a string:

 

var
Gabriella Cornish Kenny Rooney, Rupert Pickett,Brett Mcnally; Ishmael Khan Kaitlan Wallace, Elyse Marriott, Dollie Parry

 

In this string there are space, comma, and semicolon delimiters.

 

I need the output dataset to look like this:

var
Gabriella Cornish
Kenny Rooney
Rupert Pickett
Brett Mcnally
Ishmael Khan
Kaitlan Wallace
Elyse Marriott
Dollie Parry

 

So basically I need someway (I don't care which why; macro, datastep, sql etc.) that is very dynamic and can handle a variety of delimiters as well as multiple lengths.

 

Thanks in advance

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

@SasPerson85 wrote:
Yup exactly two words to match in each obs.

Then take two words out of the string at a time.

data want ;
  set have ;
  length name $200;
  do i=1 by 2 until (name=' ');
     name=catx(' ',scan(var,i,' ,;'),scan(var,i+1,' ,;'));
     if i=1 or name ne ' ' then output;
  end;
  drop var;
run;

proc print;
run;
Obs          name            i

 1     Gabriella Cornish     1
 2     Kenny Rooney          3
 3     Rupert Pickett        5
 4     Brett Mcnally         7
 5     Ishmael Khan          9
 6     Kaitlan Wallace      11
 7     Elyse Marriott       13
 8     Dollie Parry         15


View solution in original post

5 REPLIES 5
Jagadishkatam
Amethyst | Level 16

Please try below considering that the names are a combination of two words (first & last name or last or first name)

 

data have;
input var&$200.;
datalines4;
Gabriella Cornish Kenny Rooney, Rupert Pickett,Brett Mcnally; Ishmael Khan Kaitlan Wallace, Elyse Marriott, Dollie Parry
;;;;

data want;
set have;
retain var2;
do i = 1 to countw(var,' ,;')-1;
var2=catx(' ',scan(var,i,' ,;'),scan(var,i+1,' ,;'));
output;
end;
run;

 

Thanks,
Jag
ballardw
Super User

Or sometimes three or more words:

 

Paul Le Blanc

Eric von Lustbader

 

Don't forget that some people include things like: Junior, Senior, II, III, IV, and some times Doctor, Esquire, and other fussiness.

 

I feel your pain. I would tend to split the data into records at the comma and semicolon, unless you have some entries that are using Lastname, Firstname before tackling the space delimited bit.

 

Something like:

data temp;
  set have;
  length name $ 50.;
  do i= 1 to (countw(var,',;'));
   name= strip(scan(var,i,',;'));
   problem = (countw(name) ne 2);
   output;
  end;
  drop i var;
run;

The ones where problem = 1 may require additional work. Some may yield to @Jagadishkatam's bit of pulling things out 2 at a time. But where to deal with 3 or 5 or more name elements may require manual processing.

Ksharp
Super User

Is that exactly two words to match in an obs?

 

data have;
infile datalines dlm=' ,;';
input (first last) (: $40.) @@;
var=catx(' ',first,last);
datalines4;
Gabriella Cornish Kenny Rooney, Rupert Pickett,Brett Mcnally; Ishmael Khan Kaitlan Wallace, Elyse Marriott, Dollie Parry
;;;;
SasPerson85
Calcite | Level 5
Yup exactly two words to match in each obs.
Tom
Super User Tom
Super User

@SasPerson85 wrote:
Yup exactly two words to match in each obs.

Then take two words out of the string at a time.

data want ;
  set have ;
  length name $200;
  do i=1 by 2 until (name=' ');
     name=catx(' ',scan(var,i,' ,;'),scan(var,i+1,' ,;'));
     if i=1 or name ne ' ' then output;
  end;
  drop var;
run;

proc print;
run;
Obs          name            i

 1     Gabriella Cornish     1
 2     Kenny Rooney          3
 3     Rupert Pickett        5
 4     Brett Mcnally         7
 5     Ishmael Khan          9
 6     Kaitlan Wallace      11
 7     Elyse Marriott       13
 8     Dollie Parry         15


hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 5936 views
  • 0 likes
  • 5 in conversation