I have file that has contents like
<td VALIGN=BASELINE>
<a href="cleopatra/index.html">Antony and Cleopatra</a>
<br><a href="coriolanus/index.html">Coriolanus</a>
<br><a href="hamlet/index.html">Hamlet</a>
and what is outupt by this code:
data _null_;
length line1 $30 line2 $100 ;
infile main DELIMITER='href=';
input @;
input line1 line2;
put line1= line2= ;
run;
is
line1=<td VALIGN line2=BASELINE>
line1=<a line2="cl
line1=<b line2=><a
line1=<b line2=><a
what i would like to have output is only the lines with index.html in them
line1="cleopatra/index.html" line2=Antony and Cleopatra
line1="coriolanus/index.html" line2=Coriolanus
line1="hamlet/index.html" line2=Hamlet
any ideas on how to do this?
Hi:
Here's a sample program that illustrates the logic using DATALINES or "in-stream" data, as an example of the type of parsing that was suggested..
Cynthia
data _null_;
length bigline $100 line1 $50 line2 $100 ;
infile datalines truncover;
** read the whole line;
input @1 bigline $100.;
** then parse the lines with HREF and INDEX.HTML only;
if find(upcase(bigline),'HREF') gt 0 and
find(upcase(bigline),'INDEX.HTML') gt 0 then do;
line1 = scan(bigline,2,'"');
line2 = scan(bigline,-3,'<>');
put _n_= line1= line2=;
output;
end;
return;
datalines4;
<td VALIGN=BASELINE>
<a href="cleopatra/index.html">Antony and Cleopatra</a>
<br><a href="coriolanus/index.html">Coriolanus</a>
<br><a href="hamlet/index.html">Hamlet</a>
;;;;
run;
When use use delimeter = 'href' it will trean any of the individual characters as a delimeter, not the whole string.
So since "r" is a delimiter you <b as the "r" was used as a delimiter. Use dlmstr='href' to get the behavior you're looking for. You may
Read the whole line into a single variable and use one of the string search functions such as FINDW or INDEXW. With your example lines you'll need to include / and . in delimiters of the function.
if findw(upcase(string),'INDEX',' /.;:') = 0 then delete;
cool
can I ask how I would read it into a var
I tried length line1 $30 line2 $100 ;
Hi:
Here's a sample program that illustrates the logic using DATALINES or "in-stream" data, as an example of the type of parsing that was suggested..
Cynthia
data _null_;
length bigline $100 line1 $50 line2 $100 ;
infile datalines truncover;
** read the whole line;
input @1 bigline $100.;
** then parse the lines with HREF and INDEX.HTML only;
if find(upcase(bigline),'HREF') gt 0 and
find(upcase(bigline),'INDEX.HTML') gt 0 then do;
line1 = scan(bigline,2,'"');
line2 = scan(bigline,-3,'<>');
put _n_= line1= line2=;
output;
end;
return;
datalines4;
<td VALIGN=BASELINE>
<a href="cleopatra/index.html">Antony and Cleopatra</a>
<br><a href="coriolanus/index.html">Coriolanus</a>
<br><a href="hamlet/index.html">Hamlet</a>
;;;;
run;
data _null_;
infile datalines flowover dlm='<>"';
input @'href="' a : $100. @'>' b : $100. ;
put a= b=;
datalines4;
<td VALIGN=BASELINE>
<a href="cleopatra/index.html">Antony and Cleopatra</a>
<br><a href="coriolanus/index.html">Coriolanus</a>
<br><a href="hamlet/index.html">Hamlet</a>
;;;;
run;
Xia Keshan
Message was edited by: xia keshan
ok cool one last thing
sometimes I have lines like this so line2 is blank until the second line is read
<a href="allswell/index.html">
All's Well That Ends Well</a>
<a href="asyoulikeit/index.html">
As You Like It</a>
so I put in logic like
if trim(line2) EQ "" then do |
oldline1 = line1;
end; |
/*put _n_= line1= line2=;*/
if trim(line2) NE "" then do; | |
if trim(line1) EQ "" then do; | |
line1 = oldline1; | |
put '------------line1 blank ' line1= line2= oldline1=; | |
end; | |
put 'http://shakespeare.mit.edu/' line1 line2 oldline1= ; | |
end; so it wont be printed until line2 is populated and line1 is just reassigned to the last line1 (the value of oldline1) but somewhere oldline1 is getting reset any ideas |
You'll need to add: length oldline1 $xxx ; making xxx large enough to hold all the characters expected and then Retain oldline1; to keep the value across records.
AND likely want to reset it to blank when no longer needed.
OK. treat is as a stream file .
data x;
infile 'c:\temp\sample.txt' recfm=n dlm='<>"';
input x : $100. @@ ;
if lag(x) = 'a href=' or lag2(x)='a href=';
run;
Xia Keshan
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.