BookmarkSubscribeRSS Feed
Jay1
Calcite | Level 5

I am using PROC HTTP to pull the following site: http://app.hpla.doh.dc.gov/weblookup/Search.aspx.

The challenge is that they break up their results on multiple pages.  Is there an easy way to tell proc http just to grab it all?

3 REPLIES 3
art297
Opal | Level 21

I'm not familiar with the site, and friedegg is definitely better at this than I am, but I presume you are submitting something like: __doPostBack('datagrid_results$_ctl44$_ctl0','')

the last part is indicating which of the (what appear to be standard) 40 pages you want, which appear to be numbered from ctl0 to ctl39.  You could just submit 40 of those statements numbers from ctl0 to ctl39.

Jay1
Calcite | Level 5

I want to be automate it though.  As they add people, the number of pages will grow.

Also is it possible for some sites to tell the difference between PROC HTTP and firefox?  The following code seems to get blocked:

filename in  '...';

filename out '...';

DATA _NULL_;

y='__VIEWSTATE=%2FwEPDwUKLTMzOTcyOTAwMGRkvgRbuwnv4KbGV9NO1ykbEZBjrSg%3D&__EVENTVALIDATION=%2FwEWCwLCgKrWCgLkw8LZCALkw87ZCALkw8rZCAK31u2PDQLWqM3QDgL%2B%2FupqAqGPgsIEAojE0dMMArfW1a4MAq3qmaYPyupjM75%2FOoF7dcmGrwJNpMcfXWA%3D&ctl00%24PageContent%24SSN1=&ctl00%24PageContent%24SSN2=&ctl00%24PageContent%24SSN3=&ctl00%24PageContent%24fname=&ctl00%24PageContent%24mname=&ctl00%24PageContent%24lname=a&ctl00%24PageContent%24btnSubmit2=Submit';

file in lrecl=475;

put y;

RUN;

PROC HTTP

in=in

out=out

url='http://health.state.tn.us/AbuseRegistry/default.aspx'

method='POST'

ct='application/x-www-form-urlencoded';

RUN;

FriedEgg
SAS Employee

Scraping dynamic forms from ASP.net applications is difficult and SAS does not really have a lot of the tools you need.  Also eventually when the sites you are scraping replace their encryption keys nothing will work anymore.  With the large number of scrapes you are trying to accomplish I would recommend using an outside tool built specifically for what you are trying to do.

If you did want to do this in SAS what you need to do is decode what exactly the javascript call __doPostBack is doing.  It is probably performing a new post.  Hopefully it still uses the same viewstate and evenvalidation pieces.  The process would be to make a call to the initial search results.  Gather your data, and check for a javascript link to a subsequent page.  If it exists make a new call to the subsequent datagrid location, and so on, in a loop.

Best of luck, if I were you I would move outside of SAS to perform these heavy scraping tasks (which you should probably confirm the legality of).

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 927 views
  • 0 likes
  • 3 in conversation