I have file that has contents like
<td VALIGN=BASELINE>
<a href="cleopatra/index.html">Antony and Cleopatra</a>
<br><a href="coriolanus/index.html">Coriolanus</a>
<br><a href="hamlet/index.html">Hamlet</a>
and what is outupt by this code:
data _null_;
length line1 $30 line2 $100 ;
infile main DELIMITER='href=';
input @;
input line1 line2;
put line1= line2= ;
run;
is
line1=<td VALIGN line2=BASELINE>
line1=<a line2="cl
line1=<b line2=><a
line1=<b line2=><a
what i would like to have output is only the lines with index.html in them
line1="cleopatra/index.html" line2=Antony and Cleopatra
line1="coriolanus/index.html" line2=Coriolanus
line1="hamlet/index.html" line2=Hamlet
any ideas on how to do this?
Hi:
Here's a sample program that illustrates the logic using DATALINES or "in-stream" data, as an example of the type of parsing that was suggested..
Cynthia
data _null_;
length bigline $100 line1 $50 line2 $100 ;
infile datalines truncover;
** read the whole line;
input @1 bigline $100.;
** then parse the lines with HREF and INDEX.HTML only;
if find(upcase(bigline),'HREF') gt 0 and
find(upcase(bigline),'INDEX.HTML') gt 0 then do;
line1 = scan(bigline,2,'"');
line2 = scan(bigline,-3,'<>');
put _n_= line1= line2=;
output;
end;
return;
datalines4;
<td VALIGN=BASELINE>
<a href="cleopatra/index.html">Antony and Cleopatra</a>
<br><a href="coriolanus/index.html">Coriolanus</a>
<br><a href="hamlet/index.html">Hamlet</a>
;;;;
run;
When use use delimeter = 'href' it will trean any of the individual characters as a delimeter, not the whole string.
So since "r" is a delimiter you <b as the "r" was used as a delimiter. Use dlmstr='href' to get the behavior you're looking for. You may
Read the whole line into a single variable and use one of the string search functions such as FINDW or INDEXW. With your example lines you'll need to include / and . in delimiters of the function.
if findw(upcase(string),'INDEX',' /.;:') = 0 then delete;
cool
can I ask how I would read it into a var
I tried length line1 $30 line2 $100 ;
Hi:
Here's a sample program that illustrates the logic using DATALINES or "in-stream" data, as an example of the type of parsing that was suggested..
Cynthia
data _null_;
length bigline $100 line1 $50 line2 $100 ;
infile datalines truncover;
** read the whole line;
input @1 bigline $100.;
** then parse the lines with HREF and INDEX.HTML only;
if find(upcase(bigline),'HREF') gt 0 and
find(upcase(bigline),'INDEX.HTML') gt 0 then do;
line1 = scan(bigline,2,'"');
line2 = scan(bigline,-3,'<>');
put _n_= line1= line2=;
output;
end;
return;
datalines4;
<td VALIGN=BASELINE>
<a href="cleopatra/index.html">Antony and Cleopatra</a>
<br><a href="coriolanus/index.html">Coriolanus</a>
<br><a href="hamlet/index.html">Hamlet</a>
;;;;
run;
data _null_;
infile datalines flowover dlm='<>"';
input @'href="' a : $100. @'>' b : $100. ;
put a= b=;
datalines4;
<td VALIGN=BASELINE>
<a href="cleopatra/index.html">Antony and Cleopatra</a>
<br><a href="coriolanus/index.html">Coriolanus</a>
<br><a href="hamlet/index.html">Hamlet</a>
;;;;
run;
Xia Keshan
Message was edited by: xia keshan
ok cool one last thing
sometimes I have lines like this so line2 is blank until the second line is read
<a href="allswell/index.html">
All's Well That Ends Well</a>
<a href="asyoulikeit/index.html">
As You Like It</a>
so I put in logic like
if trim(line2) EQ "" then do |
oldline1 = line1;
end; |
/*put _n_= line1= line2=;*/
if trim(line2) NE "" then do; | |
if trim(line1) EQ "" then do; | |
line1 = oldline1; | |
put '------------line1 blank ' line1= line2= oldline1=; | |
end; | |
put 'http://shakespeare.mit.edu/' line1 line2 oldline1= ; | |
end; so it wont be printed until line2 is populated and line1 is just reassigned to the last line1 (the value of oldline1) but somewhere oldline1 is getting reset any ideas |
You'll need to add: length oldline1 $xxx ; making xxx large enough to hold all the characters expected and then Retain oldline1; to keep the value across records.
AND likely want to reset it to blank when no longer needed.
OK. treat is as a stream file .
data x;
infile 'c:\temp\sample.txt' recfm=n dlm='<>"';
input x : $100. @@ ;
if lag(x) = 'a href=' or lag2(x)='a href=';
run;
Xia Keshan
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.