12-19-2011 01:21 PM
I am using PROC HTTP to pull the following site: http://app.hpla.doh.dc.gov/weblookup/Search.aspx.
The challenge is that they break up their results on multiple pages. Is there an easy way to tell proc http just to grab it all?
12-19-2011 06:04 PM
I'm not familiar with the site, and friedegg is definitely better at this than I am, but I presume you are submitting something like: __doPostBack('datagrid_results$_ctl44$_ctl0','')
the last part is indicating which of the (what appear to be standard) 40 pages you want, which appear to be numbered from ctl0 to ctl39. You could just submit 40 of those statements numbers from ctl0 to ctl39.
12-20-2011 10:51 AM
I want to be automate it though. As they add people, the number of pages will grow.
Also is it possible for some sites to tell the difference between PROC HTTP and firefox? The following code seems to get blocked:
filename in '...';
filename out '...';
file in lrecl=475;
12-20-2011 02:11 PM
Scraping dynamic forms from ASP.net applications is difficult and SAS does not really have a lot of the tools you need. Also eventually when the sites you are scraping replace their encryption keys nothing will work anymore. With the large number of scrapes you are trying to accomplish I would recommend using an outside tool built specifically for what you are trying to do.
Best of luck, if I were you I would move outside of SAS to perform these heavy scraping tasks (which you should probably confirm the legality of).