BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
may0423
Obsidian | Level 7

Hi all, 

 

I am trying to get some replied contents from a website. Below is my code. However, it give me only the first row, no matter how many rows between <blockquote class="postcontent restore "> and </blockquote> in the contents .

 

Code:

data testpostcontents; 
filename indata url 'http://forums.vwvortex.com/showthread.php?7286873-To-our-readers' lrecl=10000;
infile indata length=len;
input record $varying10000. len;
input @ '<blockquote class="postcontent restore "> ' / _line_ :&$10000. ;

run;

 

One example of the replied contents:

<blockquote class="postcontent restore ">
Definitely a big part of many enthusiast's lives. <br />
<br />
<br />
I try to explain the bond I have made with people all across the world, to my wife and she just doesn't get it. <br />
<br />
To the new owners of VW Vortex - Good luck! Don't mind the CEL. It will always be there. Also... Ban Jett!<br />
<br />
<br />
To Jamie and George - Best of luck in your future endeavors. Family coming first is the best decision any man can make. Congrats guys!
</blockquote>

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

What works will depend on the site. Below code appears to work with the URL you've posted.


filename indata url 'http://forums.vwvortex.com/showthread.php?7286873-To-our-readers' lrecl=32767;

data test;
  infile indata length=len;
  input record $varying32767. len;

  retain _readflg 0;
  if find(record,'<blockquote class="postcontent','i') then _readflg=1;
  else if find(record,'</blockquote>','i') then _readflg=0;
  else if _readflg=1 then
    do;
      record=prxchange('s/<.*>//oi',-1,record);
      record=compress(record,,'kw');
      record=htmldecode(record);
      if not missing(record) then output;
    end;
run;

View solution in original post

2 REPLIES 2
Patrick
Opal | Level 21

What works will depend on the site. Below code appears to work with the URL you've posted.


filename indata url 'http://forums.vwvortex.com/showthread.php?7286873-To-our-readers' lrecl=32767;

data test;
  infile indata length=len;
  input record $varying32767. len;

  retain _readflg 0;
  if find(record,'<blockquote class="postcontent','i') then _readflg=1;
  else if find(record,'</blockquote>','i') then _readflg=0;
  else if _readflg=1 then
    do;
      record=prxchange('s/<.*>//oi',-1,record);
      record=compress(record,,'kw');
      record=htmldecode(record);
      if not missing(record) then output;
    end;
run;
may0423
Obsidian | Level 7

Thank you @Patrick, the retain statement works!

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 1433 views
  • 1 like
  • 2 in conversation