Help using Base SAS procedures

PROC HTTP for multiple pages

Reply
Occasional Contributor
Posts: 7

PROC HTTP for multiple pages

I am using PROC HTTP to pull the following site: http://app.hpla.doh.dc.gov/weblookup/Search.aspx.

The challenge is that they break up their results on multiple pages.  Is there an easy way to tell proc http just to grab it all?

PROC Star
Posts: 7,468

PROC HTTP for multiple pages

I'm not familiar with the site, and friedegg is definitely better at this than I am, but I presume you are submitting something like: __doPostBack('datagrid_results$_ctl44$_ctl0','')

the last part is indicating which of the (what appear to be standard) 40 pages you want, which appear to be numbered from ctl0 to ctl39.  You could just submit 40 of those statements numbers from ctl0 to ctl39.

Occasional Contributor
Posts: 7

PROC HTTP for multiple pages

I want to be automate it though.  As they add people, the number of pages will grow.

Also is it possible for some sites to tell the difference between PROC HTTP and firefox?  The following code seems to get blocked:

filename in  '...';

filename out '...';

DATA _NULL_;

y='__VIEWSTATE=%2FwEPDwUKLTMzOTcyOTAwMGRkvgRbuwnv4KbGV9NO1ykbEZBjrSg%3D&__EVENTVALIDATION=%2FwEWCwLCgKrWCgLkw8LZCALkw87ZCALkw8rZCAK31u2PDQLWqM3QDgL%2B%2FupqAqGPgsIEAojE0dMMArfW1a4MAq3qmaYPyupjM75%2FOoF7dcmGrwJNpMcfXWA%3D&ctl00%24PageContent%24SSN1=&ctl00%24PageContent%24SSN2=&ctl00%24PageContent%24SSN3=&ctl00%24PageContent%24fname=&ctl00%24PageContent%24mname=&ctl00%24PageContent%24lname=a&ctl00%24PageContent%24btnSubmit2=Submit';

file in lrecl=475;

put y;

RUN;

PROC HTTP

in=in

out=out

url='http://health.state.tn.us/AbuseRegistry/default.aspx'

method='POST'

ct='application/x-www-form-urlencoded';

RUN;

Trusted Advisor
Posts: 1,301

PROC HTTP for multiple pages

Scraping dynamic forms from ASP.net applications is difficult and SAS does not really have a lot of the tools you need.  Also eventually when the sites you are scraping replace their encryption keys nothing will work anymore.  With the large number of scrapes you are trying to accomplish I would recommend using an outside tool built specifically for what you are trying to do.

If you did want to do this in SAS what you need to do is decode what exactly the javascript call __doPostBack is doing.  It is probably performing a new post.  Hopefully it still uses the same viewstate and evenvalidation pieces.  The process would be to make a call to the initial search results.  Gather your data, and check for a javascript link to a subsequent page.  If it exists make a new call to the subsequent datagrid location, and so on, in a loop.

Best of luck, if I were you I would move outside of SAS to perform these heavy scraping tasks (which you should probably confirm the legality of).

Ask a Question
Discussion stats
  • 3 replies
  • 177 views
  • 0 likes
  • 3 in conversation