Hi!
Trying to figure out how to split rows into multiple rows based on words or characters in a variable. Searched but didn't find anything that mixed text and chars as a solution....
fname in data below contains 'and', '+', '&' , 'and not' as the words to split the rows.
data homes;
length fname $100 lname $19 addr $38 city $20 state $2 zip 8;
infile cards dlm="|" ;
input fname $ lname $ addr $ city $ state $ zip;
cards;
Raggedy Anne and Andy|Family|In The Attic|DC|DC|20007
Batman + Robin|Superduders|The Batcave|Gotham|NY|10012
Jack & Jill|Smith|123 Uphill Rd|Alma|CO|80420
Ernie but not Bert|Muppet|Sesame Street|NY|NY|10128
;
run;
want:
fname | lname | addr | city | state | zip |
Raggedy Anne | Family | In The Attic | DC | DC | 20007 |
Andy | Family | In The Attic | DC | DC | 20007 |
Batman | Superduders | The Batcave | Gotham | NY | 10012 |
Robin | Superduders | The Batcave | Gotham | NY | 10012 |
Jack | Smith | 123 Uphill Rd | Alma | CO | 80420 |
Jill | Smith | 123 Uphill Rd | Alma | CO | 80420 |
Ernie | Muppet | Sesame Street | NY | NY | 10128 |
Bert | Muppet | Sesame Street | NY | NY | 10128 |
I've gotten to:
data homes_fixed;
length new_fname $8.;
set Homes;
do i=1 by 1 while (scan(fname,i,'&+') ^='');
new_fname=trim(scan(fname,i,'&+'));
output;
end;
run;
but 'Raggedy Anne and Andy' and 'Ernie but not Bert' doesn't split (don't see how to put text ' and ' into scan function...)
I'm ok with a prxchange/ prxmatch solution as well but trim(scan is a little easier to understand....
Thanks for any help!
One approach might be to replace a string in the phrase with a single character so the scan works.
Consider
data homes; length fname $100 lname $19 addr $38 city $20 state $2 zip 8; infile cards dlm="|" ; input fname $ lname $ addr $ city $ state $ zip; cards; Raggedy Anne and Andy|Family|In The Attic|DC|DC|20007 Batman + Robin|Superduders|The Batcave|Gotham|NY|10012 Jack & Jill|Smith|123 Uphill Rd|Alma|CO|80420 Ernie but not Bert|Muppet|Sesame Street|NY|NY|10128 Fred and not Dorf|Muppet|Sesame Street|NY|NY|10128 ; run; data homes_fixed; length new_fname $8.; set Homes; tempfname = fname; tempfname = tranwrd(tempfname,' and not ','+'); tempfname = tranwrd(tempfname,' but not ','+'); tempfname = tranwrd(tempfname,' and ','+'); do i=1 by 1 while (scan(tempfname,i,'&+') ^=''); new_fname=strip(scan(tempfname,i,'&+')); output; end; run;
Notice that the tranwrd is looking for " and ", spaces as part of the search on both ends. That prevents removing "and" when it is part of a name or other word such as Bertrand.
And that the longer phrases " and not " are searched for before " and ". That means the longer phrase is removed first.
I include " but not " as that was in the data but your requirement was "data below contains 'and', '+', '&' , 'and not' as the word"
One approach might be to replace a string in the phrase with a single character so the scan works.
Consider
data homes; length fname $100 lname $19 addr $38 city $20 state $2 zip 8; infile cards dlm="|" ; input fname $ lname $ addr $ city $ state $ zip; cards; Raggedy Anne and Andy|Family|In The Attic|DC|DC|20007 Batman + Robin|Superduders|The Batcave|Gotham|NY|10012 Jack & Jill|Smith|123 Uphill Rd|Alma|CO|80420 Ernie but not Bert|Muppet|Sesame Street|NY|NY|10128 Fred and not Dorf|Muppet|Sesame Street|NY|NY|10128 ; run; data homes_fixed; length new_fname $8.; set Homes; tempfname = fname; tempfname = tranwrd(tempfname,' and not ','+'); tempfname = tranwrd(tempfname,' but not ','+'); tempfname = tranwrd(tempfname,' and ','+'); do i=1 by 1 while (scan(tempfname,i,'&+') ^=''); new_fname=strip(scan(tempfname,i,'&+')); output; end; run;
Notice that the tranwrd is looking for " and ", spaces as part of the search on both ends. That prevents removing "and" when it is part of a name or other word such as Bertrand.
And that the longer phrases " and not " are searched for before " and ". That means the longer phrase is removed first.
I include " but not " as that was in the data but your requirement was "data below contains 'and', '+', '&' , 'and not' as the word"
That works- thanks!
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.