Hi all,
I am trying to get some replied contents from a website. Below is my code. However, it give me only the first row, no matter how many rows between <blockquote class="postcontent restore "> and </blockquote> in the contents .
Code:
data testpostcontents;
filename indata url 'http://forums.vwvortex.com/showthread.php?7286873-To-our-readers' lrecl=10000;
infile indata length=len;
input record $varying10000. len;
input @ '<blockquote class="postcontent restore "> ' / _line_ :&$10000. ;
run;
One example of the replied contents:
<blockquote class="postcontent restore ">
Definitely a big part of many enthusiast's lives. <br />
<br />
<br />
I try to explain the bond I have made with people all across the world, to my wife and she just doesn't get it. <br />
<br />
To the new owners of VW Vortex - Good luck! Don't mind the CEL. It will always be there. Also... Ban Jett!<br />
<br />
<br />
To Jamie and George - Best of luck in your future endeavors. Family coming first is the best decision any man can make. Congrats guys!
</blockquote>
What works will depend on the site. Below code appears to work with the URL you've posted.
filename indata url 'http://forums.vwvortex.com/showthread.php?7286873-To-our-readers' lrecl=32767;
data test;
infile indata length=len;
input record $varying32767. len;
retain _readflg 0;
if find(record,'<blockquote class="postcontent','i') then _readflg=1;
else if find(record,'</blockquote>','i') then _readflg=0;
else if _readflg=1 then
do;
record=prxchange('s/<.*>//oi',-1,record);
record=compress(record,,'kw');
record=htmldecode(record);
if not missing(record) then output;
end;
run;
What works will depend on the site. Below code appears to work with the URL you've posted.
filename indata url 'http://forums.vwvortex.com/showthread.php?7286873-To-our-readers' lrecl=32767;
data test;
infile indata length=len;
input record $varying32767. len;
retain _readflg 0;
if find(record,'<blockquote class="postcontent','i') then _readflg=1;
else if find(record,'</blockquote>','i') then _readflg=0;
else if _readflg=1 then
do;
record=prxchange('s/<.*>//oi',-1,record);
record=compress(record,,'kw');
record=htmldecode(record);
if not missing(record) then output;
end;
run;
Thank you @Patrick, the retain statement works!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.