BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
cjinsf
Obsidian | Level 7

Hi!

 

Trying to figure out how to split rows into multiple rows based on words or characters in a variable.  Searched but didn't find anything that mixed text and chars as a solution....

 

fname in data below contains 'and', '+', '&' , 'and not' as the words to split the rows.

 

data homes;
length fname $100 lname $19 addr $38 city $20 state $2 zip 8;
infile cards dlm="|" ;
input fname $ lname $ addr $ city $ state $ zip;
cards;
Raggedy Anne and Andy|Family|In The Attic|DC|DC|20007
Batman + Robin|Superduders|The Batcave|Gotham|NY|10012
Jack & Jill|Smith|123 Uphill Rd|Alma|CO|80420

Ernie but not Bert|Muppet|Sesame Street|NY|NY|10128

;
run;

 

 

want:

fname lname addr city state zip
Raggedy Anne Family In The Attic DC DC 20007
Andy Family In The Attic DC DC 20007
Batman Superduders The Batcave Gotham NY 10012
Robin Superduders The Batcave Gotham NY 10012
Jack Smith 123 Uphill Rd Alma CO 80420
Jill Smith 123 Uphill Rd Alma CO 80420
Ernie Muppet Sesame Street NY NY 10128
Bert  Muppet Sesame Street NY NY 10128


I've gotten to:

 

data homes_fixed;
length new_fname $8.;
set Homes;

do i=1 by 1 while (scan(fname,i,'&+') ^='');
new_fname=trim(scan(fname,i,'&+'));
output;
end;
run;

 

but 'Raggedy Anne and Andy' and 'Ernie but not Bert' doesn't split (don't see how to put text ' and ' into scan function...)

 

I'm ok with a prxchange/ prxmatch solution as well but trim(scan is a little easier to understand....

 

Thanks for any help!

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

One approach might be to replace a string in the phrase with a single character so the scan works.

Consider

data homes;
   length fname $100 lname $19 addr $38 city $20 state $2 zip 8;
   infile cards dlm="|" ;
   input fname $ lname $ addr $ city $ state $ zip;
cards;
Raggedy Anne and Andy|Family|In The Attic|DC|DC|20007
Batman + Robin|Superduders|The Batcave|Gotham|NY|10012
Jack & Jill|Smith|123 Uphill Rd|Alma|CO|80420
Ernie but not Bert|Muppet|Sesame Street|NY|NY|10128
Fred and not Dorf|Muppet|Sesame Street|NY|NY|10128
;
run;

data homes_fixed;
   length new_fname $8.; 
   set Homes;
   tempfname = fname;
   tempfname = tranwrd(tempfname,' and not ','+');
   tempfname = tranwrd(tempfname,' but not ','+');
   tempfname = tranwrd(tempfname,' and ','+');
   do i=1 by 1 while (scan(tempfname,i,'&+') ^=''); 
      new_fname=strip(scan(tempfname,i,'&+')); 
      output; 
   end; 
run;

Notice that the tranwrd is looking for " and ", spaces as part of the search on both ends. That prevents removing "and" when it is part of a name or other word such as Bertrand.

 

And that the longer phrases " and not " are searched for before " and ". That means the longer phrase is removed first.

 

 

I include " but not " as that was in the data but your requirement was "data below contains 'and', '+', '&' , 'and not' as the word"

View solution in original post

2 REPLIES 2
ballardw
Super User

One approach might be to replace a string in the phrase with a single character so the scan works.

Consider

data homes;
   length fname $100 lname $19 addr $38 city $20 state $2 zip 8;
   infile cards dlm="|" ;
   input fname $ lname $ addr $ city $ state $ zip;
cards;
Raggedy Anne and Andy|Family|In The Attic|DC|DC|20007
Batman + Robin|Superduders|The Batcave|Gotham|NY|10012
Jack & Jill|Smith|123 Uphill Rd|Alma|CO|80420
Ernie but not Bert|Muppet|Sesame Street|NY|NY|10128
Fred and not Dorf|Muppet|Sesame Street|NY|NY|10128
;
run;

data homes_fixed;
   length new_fname $8.; 
   set Homes;
   tempfname = fname;
   tempfname = tranwrd(tempfname,' and not ','+');
   tempfname = tranwrd(tempfname,' but not ','+');
   tempfname = tranwrd(tempfname,' and ','+');
   do i=1 by 1 while (scan(tempfname,i,'&+') ^=''); 
      new_fname=strip(scan(tempfname,i,'&+')); 
      output; 
   end; 
run;

Notice that the tranwrd is looking for " and ", spaces as part of the search on both ends. That prevents removing "and" when it is part of a name or other word such as Bertrand.

 

And that the longer phrases " and not " are searched for before " and ". That means the longer phrase is removed first.

 

 

I include " but not " as that was in the data but your requirement was "data below contains 'and', '+', '&' , 'and not' as the word"

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 1258 views
  • 0 likes
  • 2 in conversation