Hi!
Trying to figure out how to split rows into multiple rows based on words or characters in a variable. Searched but didn't find anything that mixed text and chars as a solution....
fname in data below contains 'and', '+', '&' , 'and not' as the words to split the rows.
data homes;
length fname $100 lname $19 addr $38 city $20 state $2 zip 8;
infile cards dlm="|" ;
input fname $ lname $ addr $ city $ state $ zip;
cards;
Raggedy Anne and Andy|Family|In The Attic|DC|DC|20007
Batman + Robin|Superduders|The Batcave|Gotham|NY|10012
Jack & Jill|Smith|123 Uphill Rd|Alma|CO|80420
Ernie but not Bert|Muppet|Sesame Street|NY|NY|10128
;
run;
want:
fname | lname | addr | city | state | zip |
Raggedy Anne | Family | In The Attic | DC | DC | 20007 |
Andy | Family | In The Attic | DC | DC | 20007 |
Batman | Superduders | The Batcave | Gotham | NY | 10012 |
Robin | Superduders | The Batcave | Gotham | NY | 10012 |
Jack | Smith | 123 Uphill Rd | Alma | CO | 80420 |
Jill | Smith | 123 Uphill Rd | Alma | CO | 80420 |
Ernie | Muppet | Sesame Street | NY | NY | 10128 |
Bert | Muppet | Sesame Street | NY | NY | 10128 |
I've gotten to:
data homes_fixed;
length new_fname $8.;
set Homes;
do i=1 by 1 while (scan(fname,i,'&+') ^='');
new_fname=trim(scan(fname,i,'&+'));
output;
end;
run;
but 'Raggedy Anne and Andy' and 'Ernie but not Bert' doesn't split (don't see how to put text ' and ' into scan function...)
I'm ok with a prxchange/ prxmatch solution as well but trim(scan is a little easier to understand....
Thanks for any help!
One approach might be to replace a string in the phrase with a single character so the scan works.
Consider
data homes; length fname $100 lname $19 addr $38 city $20 state $2 zip 8; infile cards dlm="|" ; input fname $ lname $ addr $ city $ state $ zip; cards; Raggedy Anne and Andy|Family|In The Attic|DC|DC|20007 Batman + Robin|Superduders|The Batcave|Gotham|NY|10012 Jack & Jill|Smith|123 Uphill Rd|Alma|CO|80420 Ernie but not Bert|Muppet|Sesame Street|NY|NY|10128 Fred and not Dorf|Muppet|Sesame Street|NY|NY|10128 ; run; data homes_fixed; length new_fname $8.; set Homes; tempfname = fname; tempfname = tranwrd(tempfname,' and not ','+'); tempfname = tranwrd(tempfname,' but not ','+'); tempfname = tranwrd(tempfname,' and ','+'); do i=1 by 1 while (scan(tempfname,i,'&+') ^=''); new_fname=strip(scan(tempfname,i,'&+')); output; end; run;
Notice that the tranwrd is looking for " and ", spaces as part of the search on both ends. That prevents removing "and" when it is part of a name or other word such as Bertrand.
And that the longer phrases " and not " are searched for before " and ". That means the longer phrase is removed first.
I include " but not " as that was in the data but your requirement was "data below contains 'and', '+', '&' , 'and not' as the word"
One approach might be to replace a string in the phrase with a single character so the scan works.
Consider
data homes; length fname $100 lname $19 addr $38 city $20 state $2 zip 8; infile cards dlm="|" ; input fname $ lname $ addr $ city $ state $ zip; cards; Raggedy Anne and Andy|Family|In The Attic|DC|DC|20007 Batman + Robin|Superduders|The Batcave|Gotham|NY|10012 Jack & Jill|Smith|123 Uphill Rd|Alma|CO|80420 Ernie but not Bert|Muppet|Sesame Street|NY|NY|10128 Fred and not Dorf|Muppet|Sesame Street|NY|NY|10128 ; run; data homes_fixed; length new_fname $8.; set Homes; tempfname = fname; tempfname = tranwrd(tempfname,' and not ','+'); tempfname = tranwrd(tempfname,' but not ','+'); tempfname = tranwrd(tempfname,' and ','+'); do i=1 by 1 while (scan(tempfname,i,'&+') ^=''); new_fname=strip(scan(tempfname,i,'&+')); output; end; run;
Notice that the tranwrd is looking for " and ", spaces as part of the search on both ends. That prevents removing "and" when it is part of a name or other word such as Bertrand.
And that the longer phrases " and not " are searched for before " and ". That means the longer phrase is removed first.
I include " but not " as that was in the data but your requirement was "data below contains 'and', '+', '&' , 'and not' as the word"
That works- thanks!
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.