DATA Step, Macro, Functions and more

Extract rows from website

Accepted Solution Solved
Reply
Contributor
Posts: 24
Accepted Solution

Extract rows from website

Hi all, 

 

I am trying to get some replied contents from a website. Below is my code. However, it give me only the first row, no matter how many rows between <blockquote class="postcontent restore "> and </blockquote> in the contents .

 

Code:

data testpostcontents; 
filename indata url 'http://forums.vwvortex.com/showthread.php?7286873-To-our-readers' lrecl=10000;
infile indata length=len;
input record $varying10000. len;
input @ '<blockquote class="postcontent restore "> ' / _line_ :&$10000. ;

run;

 

One example of the replied contents:

<blockquote class="postcontent restore ">
Definitely a big part of many enthusiast's lives. <br />
<br />
<br />
I try to explain the bond I have made with people all across the world, to my wife and she just doesn't get it. <br />
<br />
To the new owners of VW Vortex - Good luck! Don't mind the CEL. It will always be there. Also... Ban Jett!<br />
<br />
<br />
To Jamie and George - Best of luck in your future endeavors. Family coming first is the best decision any man can make. Congrats guys!
</blockquote>


Accepted Solutions
Solution
‎12-29-2016 07:43 PM
Respected Advisor
Posts: 3,900

Re: Extract rows from website

What works will depend on the site. Below code appears to work with the URL you've posted.


filename indata url 'http://forums.vwvortex.com/showthread.php?7286873-To-our-readers' lrecl=32767;

data test;
  infile indata length=len;
  input record $varying32767. len;

  retain _readflg 0;
  if find(record,'<blockquote class="postcontent','i') then _readflg=1;
  else if find(record,'</blockquote>','i') then _readflg=0;
  else if _readflg=1 then
    do;
      record=prxchange('s/<.*>//oi',-1,record);
      record=compress(record,,'kw');
      record=htmldecode(record);
      if not missing(record) then output;
    end;
run;

View solution in original post


All Replies
Solution
‎12-29-2016 07:43 PM
Respected Advisor
Posts: 3,900

Re: Extract rows from website

What works will depend on the site. Below code appears to work with the URL you've posted.


filename indata url 'http://forums.vwvortex.com/showthread.php?7286873-To-our-readers' lrecl=32767;

data test;
  infile indata length=len;
  input record $varying32767. len;

  retain _readflg 0;
  if find(record,'<blockquote class="postcontent','i') then _readflg=1;
  else if find(record,'</blockquote>','i') then _readflg=0;
  else if _readflg=1 then
    do;
      record=prxchange('s/<.*>//oi',-1,record);
      record=compress(record,,'kw');
      record=htmldecode(record);
      if not missing(record) then output;
    end;
run;
Contributor
Posts: 24

Re: Extract rows from website

Thank you @Patrick, the retain statement works!

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 2 replies
  • 170 views
  • 1 like
  • 2 in conversation