Help using Base SAS procedures

reading input from "file"

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 77
Accepted Solution

reading input from "file"

I have file that has contents like

<td VALIGN=BASELINE>

<a href="cleopatra/index.html">Antony and Cleopatra</a>

<br><a href="coriolanus/index.html">Coriolanus</a>

<br><a href="hamlet/index.html">Hamlet</a>

and what is outupt by this code:

data _null_;

length line1 $30 line2 $100 ;

infile main DELIMITER='href=';

input @;

  input line1 line2;

put line1= line2= ;

run;

is

line1=<td VALIGN line2=BASELINE>

line1=<a line2="cl

line1=<b line2=><a

line1=<b line2=><a

what i would like to have output is only the lines with index.html in them

line1="cleopatra/index.html" line2=Antony and Cleopatra

line1="coriolanus/index.html" line2=Coriolanus

line1="hamlet/index.html" line2=Hamlet

any ideas on how to do this?


Accepted Solutions
Solution
‎05-31-2014 02:04 AM
SAS Super FREQ
Posts: 8,740

Re: reading input from "file"

Hi:

  Here's a sample program that illustrates the logic using DATALINES or "in-stream" data, as an example of the type of parsing that was suggested..

Cynthia

data _null_;

  length bigline $100 line1 $50 line2 $100 ;

  infile datalines truncover;

  ** read the whole line;

  input @1 bigline $100.;

       

  ** then parse the lines with HREF and INDEX.HTML only;

  if find(upcase(bigline),'HREF') gt 0 and

     find(upcase(bigline),'INDEX.HTML') gt 0 then do;

    line1 = scan(bigline,2,'"');

    line2 = scan(bigline,-3,'<>');

    put _n_= line1= line2=;

    output;

  end;

return;

datalines4;

<td VALIGN=BASELINE>

<a href="cleopatra/index.html">Antony and Cleopatra</a>

<br><a href="coriolanus/index.html">Coriolanus</a>

<br><a href="hamlet/index.html">Hamlet</a>

;;;;

run;


parse_html.png

View solution in original post


All Replies
Super User
Posts: 10,476

Re: reading input from "file"

When use use delimeter = 'href' it will trean any of the individual characters as a delimeter, not the whole string.

So since "r" is a delimiter you <b as the "r" was used as a delimiter. Use dlmstr='href' to get the behavior you're looking for. You may

Read the whole line into a single variable and use one of the string search functions such as FINDW or INDEXW. With your example lines you'll need to include / and . in delimiters of the function.

if findw(upcase(string),'INDEX',' /.;:') = 0 then delete;

Frequent Contributor
Posts: 77

Re: reading input from "file"

cool

can I ask how I would read it into a var

I tried length line1 $30 line2 $100 ;

Solution
‎05-31-2014 02:04 AM
SAS Super FREQ
Posts: 8,740

Re: reading input from "file"

Hi:

  Here's a sample program that illustrates the logic using DATALINES or "in-stream" data, as an example of the type of parsing that was suggested..

Cynthia

data _null_;

  length bigline $100 line1 $50 line2 $100 ;

  infile datalines truncover;

  ** read the whole line;

  input @1 bigline $100.;

       

  ** then parse the lines with HREF and INDEX.HTML only;

  if find(upcase(bigline),'HREF') gt 0 and

     find(upcase(bigline),'INDEX.HTML') gt 0 then do;

    line1 = scan(bigline,2,'"');

    line2 = scan(bigline,-3,'<>');

    put _n_= line1= line2=;

    output;

  end;

return;

datalines4;

<td VALIGN=BASELINE>

<a href="cleopatra/index.html">Antony and Cleopatra</a>

<br><a href="coriolanus/index.html">Coriolanus</a>

<br><a href="hamlet/index.html">Hamlet</a>

;;;;

run;


parse_html.png
Super User
Posts: 9,671

Re: reading input from "file"

data _null_;

  infile datalines flowover  dlm='<>"';

  input @'href="' a : $100. @'>' b : $100. ;

  put a= b=;

datalines4;

<td VALIGN=BASELINE>

<a href="cleopatra/index.html">Antony and Cleopatra</a>

<br><a href="coriolanus/index.html">Coriolanus</a>

<br><a href="hamlet/index.html">Hamlet</a>

;;;;

run;

Xia Keshan

Message was edited by: xia keshan

Frequent Contributor
Posts: 77

Re: reading input from "file"

ok cool one last thing

sometimes I have lines like this so line2 is blank until the second line is read

<a href="allswell/index.html">

All's Well That Ends Well</a>

<a href="asyoulikeit/index.html">

As You Like It</a>

so I put in logic like

if trim(line2) EQ "" then do

      oldline1 = line1;

end;

    /*put _n_= line1= line2=;*/

if trim(line2) NE "" then do;
  if trim(line1) EQ "" then do;
   line1 = oldline1;
   put '------------line1 blank ' line1= line2= oldline1=;
  end;
  put 'http://shakespeare.mit.edu/' line1 line2 oldline1= ;

end;

so it wont be printed until line2 is populated and line1 is just reassigned to the last line1 (the value of oldline1)

but somewhere oldline1 is getting reset any ideas

Super User
Posts: 10,476

Re: reading input from "file"

You'll need to add: length oldline1 $xxx ; making xxx large enough to hold all the characters expected and then Retain oldline1; to keep the value across records.

AND likely want to reset it to blank when no longer needed.

Super User
Posts: 9,671

Re: reading input from "file"

OK. treat is as a stream file .

data x;

infile 'c:\temp\sample.txt' recfm=n  dlm='<>"';

  input x : $100. @@ ;

  if lag(x) = 'a href=' or lag2(x)='a href=';

run;

Xia Keshan

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 306 views
  • 1 like
  • 4 in conversation