I am using PROC HTTP to pull the following site: http://app.hpla.doh.dc.gov/weblookup/Search.aspx.
The challenge is that they break up their results on multiple pages. Is there an easy way to tell proc http just to grab it all?
I'm not familiar with the site, and friedegg is definitely better at this than I am, but I presume you are submitting something like: __doPostBack('datagrid_results$_ctl44$_ctl0','')
the last part is indicating which of the (what appear to be standard) 40 pages you want, which appear to be numbered from ctl0 to ctl39. You could just submit 40 of those statements numbers from ctl0 to ctl39.
I want to be automate it though. As they add people, the number of pages will grow.
Also is it possible for some sites to tell the difference between PROC HTTP and firefox? The following code seems to get blocked:
filename in '...';
filename out '...';
DATA _NULL_;
y='__VIEWSTATE=%2FwEPDwUKLTMzOTcyOTAwMGRkvgRbuwnv4KbGV9NO1ykbEZBjrSg%3D&__EVENTVALIDATION=%2FwEWCwLCgKrWCgLkw8LZCALkw87ZCALkw8rZCAK31u2PDQLWqM3QDgL%2B%2FupqAqGPgsIEAojE0dMMArfW1a4MAq3qmaYPyupjM75%2FOoF7dcmGrwJNpMcfXWA%3D&ctl00%24PageContent%24SSN1=&ctl00%24PageContent%24SSN2=&ctl00%24PageContent%24SSN3=&ctl00%24PageContent%24fname=&ctl00%24PageContent%24mname=&ctl00%24PageContent%24lname=a&ctl00%24PageContent%24btnSubmit2=Submit';
file in lrecl=475;
put y;
RUN;
PROC HTTP
in=in
out=out
url='http://health.state.tn.us/AbuseRegistry/default.aspx'
method='POST'
ct='application/x-www-form-urlencoded';
RUN;
Scraping dynamic forms from ASP.net applications is difficult and SAS does not really have a lot of the tools you need. Also eventually when the sites you are scraping replace their encryption keys nothing will work anymore. With the large number of scrapes you are trying to accomplish I would recommend using an outside tool built specifically for what you are trying to do.
If you did want to do this in SAS what you need to do is decode what exactly the javascript call __doPostBack is doing. It is probably performing a new post. Hopefully it still uses the same viewstate and evenvalidation pieces. The process would be to make a call to the initial search results. Gather your data, and check for a javascript link to a subsequent page. If it exists make a new call to the subsequent datagrid location, and so on, in a loop.
Best of luck, if I were you I would move outside of SAS to perform these heavy scraping tasks (which you should probably confirm the legality of).
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.