I have a single observation dataset that contains peoples first and last names. There are multiple delimiters involved as it comes from a free form text field.
Example of a string:
var |
Gabriella Cornish Kenny Rooney, Rupert Pickett,Brett Mcnally; Ishmael Khan Kaitlan Wallace, Elyse Marriott, Dollie Parry |
In this string there are space, comma, and semicolon delimiters.
I need the output dataset to look like this:
var |
Gabriella Cornish |
Kenny Rooney |
Rupert Pickett |
Brett Mcnally |
Ishmael Khan |
Kaitlan Wallace |
Elyse Marriott |
Dollie Parry |
So basically I need someway (I don't care which why; macro, datastep, sql etc.) that is very dynamic and can handle a variety of delimiters as well as multiple lengths.
Thanks in advance
@SasPerson85 wrote:
Yup exactly two words to match in each obs.
Then take two words out of the string at a time.
data want ;
set have ;
length name $200;
do i=1 by 2 until (name=' ');
name=catx(' ',scan(var,i,' ,;'),scan(var,i+1,' ,;'));
if i=1 or name ne ' ' then output;
end;
drop var;
run;
proc print;
run;
Obs name i 1 Gabriella Cornish 1 2 Kenny Rooney 3 3 Rupert Pickett 5 4 Brett Mcnally 7 5 Ishmael Khan 9 6 Kaitlan Wallace 11 7 Elyse Marriott 13 8 Dollie Parry 15
Please try below considering that the names are a combination of two words (first & last name or last or first name)
data have;
input var&$200.;
datalines4;
Gabriella Cornish Kenny Rooney, Rupert Pickett,Brett Mcnally; Ishmael Khan Kaitlan Wallace, Elyse Marriott, Dollie Parry
;;;;
data want;
set have;
retain var2;
do i = 1 to countw(var,' ,;')-1;
var2=catx(' ',scan(var,i,' ,;'),scan(var,i+1,' ,;'));
output;
end;
run;
Or sometimes three or more words:
Paul Le Blanc
Eric von Lustbader
Don't forget that some people include things like: Junior, Senior, II, III, IV, and some times Doctor, Esquire, and other fussiness.
I feel your pain. I would tend to split the data into records at the comma and semicolon, unless you have some entries that are using Lastname, Firstname before tackling the space delimited bit.
Something like:
data temp; set have; length name $ 50.; do i= 1 to (countw(var,',;')); name= strip(scan(var,i,',;')); problem = (countw(name) ne 2); output; end; drop i var; run;
The ones where problem = 1 may require additional work. Some may yield to @Jagadishkatam's bit of pulling things out 2 at a time. But where to deal with 3 or 5 or more name elements may require manual processing.
Is that exactly two words to match in an obs?
data have;
infile datalines dlm=' ,;';
input (first last) (: $40.) @@;
var=catx(' ',first,last);
datalines4;
Gabriella Cornish Kenny Rooney, Rupert Pickett,Brett Mcnally; Ishmael Khan Kaitlan Wallace, Elyse Marriott, Dollie Parry
;;;;
@SasPerson85 wrote:
Yup exactly two words to match in each obs.
Then take two words out of the string at a time.
data want ;
set have ;
length name $200;
do i=1 by 2 until (name=' ');
name=catx(' ',scan(var,i,' ,;'),scan(var,i+1,' ,;'));
if i=1 or name ne ' ' then output;
end;
drop var;
run;
proc print;
run;
Obs name i 1 Gabriella Cornish 1 2 Kenny Rooney 3 3 Rupert Pickett 5 4 Brett Mcnally 7 5 Ishmael Khan 9 6 Kaitlan Wallace 11 7 Elyse Marriott 13 8 Dollie Parry 15
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.