Hi!
Trying to figure out how to split rows into multiple rows based on words or characters in a variable. Searched but didn't find anything that mixed text and chars as a solution....
fname in data below contains 'and', '+', '&' , 'and not' as the words to split the rows.
data homes;
length fname $100 lname $19 addr $38 city $20 state $2 zip 8;
infile cards dlm="|" ;
input fname $ lname $ addr $ city $ state $ zip;
cards;
Raggedy Anne and Andy|Family|In The Attic|DC|DC|20007
Batman + Robin|Superduders|The Batcave|Gotham|NY|10012
Jack & Jill|Smith|123 Uphill Rd|Alma|CO|80420
Ernie but not Bert|Muppet|Sesame Street|NY|NY|10128
;
run;
want:
| fname | lname | addr | city | state | zip |
| Raggedy Anne | Family | In The Attic | DC | DC | 20007 |
| Andy | Family | In The Attic | DC | DC | 20007 |
| Batman | Superduders | The Batcave | Gotham | NY | 10012 |
| Robin | Superduders | The Batcave | Gotham | NY | 10012 |
| Jack | Smith | 123 Uphill Rd | Alma | CO | 80420 |
| Jill | Smith | 123 Uphill Rd | Alma | CO | 80420 |
| Ernie | Muppet | Sesame Street | NY | NY | 10128 |
| Bert | Muppet | Sesame Street | NY | NY | 10128 |
I've gotten to:
data homes_fixed;
length new_fname $8.;
set Homes;
do i=1 by 1 while (scan(fname,i,'&+') ^='');
new_fname=trim(scan(fname,i,'&+'));
output;
end;
run;
but 'Raggedy Anne and Andy' and 'Ernie but not Bert' doesn't split (don't see how to put text ' and ' into scan function...)
I'm ok with a prxchange/ prxmatch solution as well but trim(scan is a little easier to understand....
Thanks for any help!
One approach might be to replace a string in the phrase with a single character so the scan works.
Consider
data homes;
length fname $100 lname $19 addr $38 city $20 state $2 zip 8;
infile cards dlm="|" ;
input fname $ lname $ addr $ city $ state $ zip;
cards;
Raggedy Anne and Andy|Family|In The Attic|DC|DC|20007
Batman + Robin|Superduders|The Batcave|Gotham|NY|10012
Jack & Jill|Smith|123 Uphill Rd|Alma|CO|80420
Ernie but not Bert|Muppet|Sesame Street|NY|NY|10128
Fred and not Dorf|Muppet|Sesame Street|NY|NY|10128
;
run;
data homes_fixed;
length new_fname $8.;
set Homes;
tempfname = fname;
tempfname = tranwrd(tempfname,' and not ','+');
tempfname = tranwrd(tempfname,' but not ','+');
tempfname = tranwrd(tempfname,' and ','+');
do i=1 by 1 while (scan(tempfname,i,'&+') ^='');
new_fname=strip(scan(tempfname,i,'&+'));
output;
end;
run;
Notice that the tranwrd is looking for " and ", spaces as part of the search on both ends. That prevents removing "and" when it is part of a name or other word such as Bertrand.
And that the longer phrases " and not " are searched for before " and ". That means the longer phrase is removed first.
I include " but not " as that was in the data but your requirement was "data below contains 'and', '+', '&' , 'and not' as the word"
One approach might be to replace a string in the phrase with a single character so the scan works.
Consider
data homes;
length fname $100 lname $19 addr $38 city $20 state $2 zip 8;
infile cards dlm="|" ;
input fname $ lname $ addr $ city $ state $ zip;
cards;
Raggedy Anne and Andy|Family|In The Attic|DC|DC|20007
Batman + Robin|Superduders|The Batcave|Gotham|NY|10012
Jack & Jill|Smith|123 Uphill Rd|Alma|CO|80420
Ernie but not Bert|Muppet|Sesame Street|NY|NY|10128
Fred and not Dorf|Muppet|Sesame Street|NY|NY|10128
;
run;
data homes_fixed;
length new_fname $8.;
set Homes;
tempfname = fname;
tempfname = tranwrd(tempfname,' and not ','+');
tempfname = tranwrd(tempfname,' but not ','+');
tempfname = tranwrd(tempfname,' and ','+');
do i=1 by 1 while (scan(tempfname,i,'&+') ^='');
new_fname=strip(scan(tempfname,i,'&+'));
output;
end;
run;
Notice that the tranwrd is looking for " and ", spaces as part of the search on both ends. That prevents removing "and" when it is part of a name or other word such as Bertrand.
And that the longer phrases " and not " are searched for before " and ". That means the longer phrase is removed first.
I include " but not " as that was in the data but your requirement was "data below contains 'and', '+', '&' , 'and not' as the word"
That works- thanks!
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.