DATA Step, Macro, Functions and more

split one string into multiple rows for each phrase, without a delimiter for each phrase

Accepted Solution Solved
Reply
Contributor
Posts: 22
Accepted Solution

split one string into multiple rows for each phrase, without a delimiter for each phrase

I have a data set that looks something like this:

 

data have;
infile datalines delimiter=','; 
input IDnumber String $50.; 
datalines;           
2477,Cat Dog A blue car
2431,Cat Fish Yelllow Submarine 
;             
run;

I need to extract each data item from the string to one row so that i get this:

 

 

data want;
infile datalines delimiter=','; 
input IDnumber String $50.; 
datalines;           
2477,Cat
2477,Dog
2477,A blue car
2431,Cat
2431,Fish 
2431,Yelllow Submarine 
;             
run;

The text will sometimes be split by a single word, other times several words constitute one phrase, and one phrase can sometimes start with a capital letter followed by a small letter in the next word, but othertimes it has capital letters in both words of one phrase, so I cannot use regexp or the like to do a sort of conditional do loop or array. I basically need to hardcode it so something like this, but cant quite wrap my head around the right syntax:

 

 

data try;
set have;
do ... ;
if index(String,'Cat') > 0 then output Phrase = 'Cat' ;
if index(String,'Dog') > 0 then output Phrase = 'Dog' ;
if index(String,'A blue car') > 0 then output Phrase = 'A blue car' ;
if index(String,'Fish') > 0 then output Phrase = 'Fish' ;
if index(String,'Yellow submarine') > 0 then output Phrase = 'Yellow submarine' ;
;
end;
run;

 


Accepted Solutions
Solution
‎08-04-2017 06:24 AM
Super Contributor
Posts: 345

Re: split one string into multiple rows for each phrase, without a delimiter for each phrase

An interesting issue. You don't need a loop, just and do...end and swap output-statement and assignment.

data try;
set have;

if index(String,'Cat') > 0 then do; Phrase = 'Cat' ; output; end;
...
run;

View solution in original post


All Replies
Super User
Super User
Posts: 7,977

Re: split one string into multiple rows for each phrase, without a delimiter for each phrase

Sorry, I don't see logically how I would know that "A blue car" is a separate from another string.  I mean its simple enough to split by word:

data want;
  length word $200;
  set have;
  do i=1 to countw(string," ");
    word=scan(string,i," ");
    output;
  end;
run;

But I don't see anything logical about how to get Yellow Submarine or A blue car?

Solution
‎08-04-2017 06:24 AM
Super Contributor
Posts: 345

Re: split one string into multiple rows for each phrase, without a delimiter for each phrase

An interesting issue. You don't need a loop, just and do...end and swap output-statement and assignment.

data try;
set have;

if index(String,'Cat') > 0 then do; Phrase = 'Cat' ; output; end;
...
run;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 2 replies
  • 140 views
  • 0 likes
  • 3 in conversation