Splitting a string into multiple rows where delimiter is a phrase, not a single character

Reply
Contributor
Posts: 24

Splitting a string into multiple rows where delimiter is a phrase, not a single character

[ Edited ]

I have a data set from a html where all the data is stored in one sting:

data have;
text = '<li><ahref="/Stavanger/A._Buddes_Vei/"title="A.BuddesVeiiStavanger">A.BuddesVei</a></li><li><ahref="/Stavanger/A.b.c.gata/"title="A.b.c.gataiStavanger">A.b.c.gata</a></li><li><ahref="/Stavanger/Abelstrappa/"title="AbelstrappaiStavanger">Abelstrappa</a></li>';
run;

 

I want to split this by each occurence of the string '</li><li>'. I want to try something like this, but where I can choose this "phrase" as the delimiter, not each individual character. The code below does not work, 

 

data try (keep=new);
 set have ; 
 do i=1 by 1 while(scan(text,i,'</a></li>') ^=' '); 
new=scan(text,i,'</a></li>'); 
output; 
end; 
run;

What I want is this: 

data want;
input text $250.;
datalines;
<li><ahref="/Stavanger/A._Buddes_Vei/"title="A.BuddesVeiiStavanger">A.BuddesVei</a></li> 
<li><ahref="/Stavanger/A.b.c.gata/"title="A.b.c.gataiStavanger">A.b.c.gata</a></li> 
<li><ahref="/Stavanger/Abelstrappa/"title="AbelstrappaiStavanger">Abelstrappa</a></li> 
run;

 

Super User
Super User
Posts: 8,634

Re: Splitting a string into multiple rows where delimiter is a phrase, not a single character

Something like:

%let dlm=</li>;

data have;
 text='<li><ahref="/Stavanger/A._Buddes_Vei/"title="A.BuddesVeiiStavanger">A.BuddesVei</a></li><li><ahref="/Stavanger/A.b.c.gata/"title="A.b.c.gataiStavanger">A.b.c.gata</a></li><li><ahref="/Stavanger/Abelstrappa/"title="AbelstrappaiStavanger">Abelstrappa</a></li>';
run;

data want /*(keep=res)*/;
  set have;
  length res $2000;
  do while(lengthn(text) > 0);
    res=substr(text,1,index(text,"&dlm.")+(lengthn("&dlm.")-1));
    output;
    text=substr(text,index(text,"&dlm.")+lengthn("&dlm."));
  end;
run;
Super User
Posts: 5,995

Re: Splitting a string into multiple rows where delimiter is a phrase, not a single character

If you are reading from the html file directly, you can use the DLMSTR= option on the INFILE statement.

 

data want;

infile datalines dlmstr='</a></li>';

length text $ 250;

do _n_=1 to 1000 until (text=' ');

   input text @;

   if text > ' ' then output;

end;

datalines;

.........use the wide data line here ...............................

;

 

Note, however, that the delimiting string would NOT become part of the value of TEXT.

PROC Star
Posts: 441

Re: Splitting a string into multiple rows where delimiter is a phrase, not a single character

[ Edited ]

one more way

data want(keep = new);
	set have;
length val $350;
	IF _N_=1 THEN
		PATTERN=PRXPARSE ("/(.+?\<\/a\>\<\/li\>)/");
	retain pattern;
 start = 1;
 stop = length(text);
call prxnext(pattern,start,stop,text,position,length);
do while (position > 0);
new= substr(text,position,length);
call prxnext(pattern,start,stop,text,position,length);
output;
end;

run;
Super User
Posts: 10,320

Re: Splitting a string into multiple rows where delimiter is a phrase, not a single character

How about this one ?

 

 

data have;
text = '<li><ahref="/Stavanger/A._Buddes_Vei/"title="A.BuddesVeiiStavanger">A.BuddesVei</a></li><li><ahref="/Stavanger/A.b.c.gata/"title="A.b.c.gataiStavanger">A.b.c.gata</a></li><li><ahref="/Stavanger/Abelstrappa/"title="AbelstrappaiStavanger">Abelstrappa</a></li>';
run;

data temp;
 set have;
 id+1;
 p=find(text,'</li>');
 do while(p ne 0);
  output;
  p=find(text,'</li>',p+1);
 end;
run;
data want;
 set temp;
 by id;
 lag=lag(p);
 dif=dif(p);
 if first.id then do;start=1;length=p+4;end;
  else do;start=lag+5;length=dif;end;
 str=substr(text,start,length);
keep text str;
run;
Ask a Question
Discussion stats
  • 4 replies
  • 111 views
  • 0 likes
  • 5 in conversation