I have a data set from a html where all the data is stored in one sting:
data have;
text = '<li><ahref="/Stavanger/A._Buddes_Vei/"title="A.BuddesVeiiStavanger">A.BuddesVei</a></li><li><ahref="/Stavanger/A.b.c.gata/"title="A.b.c.gataiStavanger">A.b.c.gata</a></li><li><ahref="/Stavanger/Abelstrappa/"title="AbelstrappaiStavanger">Abelstrappa</a></li>';
run;
I want to split this by each occurence of the string '</li><li>'. I want to try something like this, but where I can choose this "phrase" as the delimiter, not each individual character. The code below does not work,
data try (keep=new);
set have ;
do i=1 by 1 while(scan(text,i,'</a></li>') ^=' ');
new=scan(text,i,'</a></li>');
output;
end;
run;
What I want is this:
data want;
input text $250.;
datalines;
<li><ahref="/Stavanger/A._Buddes_Vei/"title="A.BuddesVeiiStavanger">A.BuddesVei</a></li>
<li><ahref="/Stavanger/A.b.c.gata/"title="A.b.c.gataiStavanger">A.b.c.gata</a></li>
<li><ahref="/Stavanger/Abelstrappa/"title="AbelstrappaiStavanger">Abelstrappa</a></li>
run;
Something like:
%let dlm=</li>; data have; text='<li><ahref="/Stavanger/A._Buddes_Vei/"title="A.BuddesVeiiStavanger">A.BuddesVei</a></li><li><ahref="/Stavanger/A.b.c.gata/"title="A.b.c.gataiStavanger">A.b.c.gata</a></li><li><ahref="/Stavanger/Abelstrappa/"title="AbelstrappaiStavanger">Abelstrappa</a></li>'; run; data want /*(keep=res)*/; set have; length res $2000; do while(lengthn(text) > 0); res=substr(text,1,index(text,"&dlm.")+(lengthn("&dlm.")-1)); output; text=substr(text,index(text,"&dlm.")+lengthn("&dlm.")); end; run;
If you are reading from the html file directly, you can use the DLMSTR= option on the INFILE statement.
data want;
infile datalines dlmstr='</a></li>';
length text $ 250;
do _n_=1 to 1000 until (text=' ');
input text @;
if text > ' ' then output;
end;
datalines;
.........use the wide data line here ...............................
;
Note, however, that the delimiting string would NOT become part of the value of TEXT.
one more way
data want(keep = new);
set have;
length val $350;
IF _N_=1 THEN
PATTERN=PRXPARSE ("/(.+?\<\/a\>\<\/li\>)/");
retain pattern;
start = 1;
stop = length(text);
call prxnext(pattern,start,stop,text,position,length);
do while (position > 0);
new= substr(text,position,length);
call prxnext(pattern,start,stop,text,position,length);
output;
end;
run;
How about this one ?
data have;
text = '<li><ahref="/Stavanger/A._Buddes_Vei/"title="A.BuddesVeiiStavanger">A.BuddesVei</a></li><li><ahref="/Stavanger/A.b.c.gata/"title="A.b.c.gataiStavanger">A.b.c.gata</a></li><li><ahref="/Stavanger/Abelstrappa/"title="AbelstrappaiStavanger">Abelstrappa</a></li>';
run;
data temp;
set have;
id+1;
p=find(text,'</li>');
do while(p ne 0);
output;
p=find(text,'</li>',p+1);
end;
run;
data want;
set temp;
by id;
lag=lag(p);
dif=dif(p);
if first.id then do;start=1;length=p+4;end;
else do;start=lag+5;length=dif;end;
str=substr(text,start,length);
keep text str;
run;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.