Can someone please explain how to map the information from a cURL into something usable in PROC HTTP?
I went to a website (sorry for being vague) and manually selected values and used the Chrome developer tools to capture the cURL.
All the cURL values have this general form-- a url followed by several header paramters, followed by some parmeters in a --data list, followed by --compressed:
curl 'url' [http://www.website.net] [header parameters e.g. -H 'Connection: keep-alive' -H 'Cache-Control: max-age=0' etc etc -- data 'PARAM1=value1&PARAM2=value2&PARAM3=&PARAM4=value4' --compressed
I've played around a bit with the cURL in a Mac OS terminal window, so I know that it's valid and returns what I'm looking for.
I can't find anything that lays out how to use this information in PROC HTTP, though. Most examples online either use the "GET" method, or are referencing JSON, or are using curl in an X command in SAS. The documentation shows a very bare bones POST example, as well as examples using a proxy server, but nothing has really addressed what I'm looking for.
I tried using FILE and PUT like this:
data _null_;
     file test ;
     if _n_ eq 1 then
          do ;
               put 'param1=value1' @;
               put 'param2=value2' @ ;
               put 'param3=value3' @ ;
/* etc */
          end ;
run ;and then:
proc http 
   url="https://www.website.net/default.aspx" 
   in='test'
   out=siteout ;
run;But that just gives me back the URL referenced in PROC HTTP, and not the site that would be created by submitting the parameters specified (which I got from the cURL).
Maybe I'm missing something really fundamental and basic, but I've read everything I could find, and nothing seems to relate to this directly. Also, lots of the examples leap quickly into being inside a macro loop or something, so it's hard to figure out the basics in the midst of the more advanced coding context of macros. Don't get me wrong, I know macros, but I'm just trying to work out the basics of PROC HTTP before I turn to Python for the scraping.
Thanks
Jed
While it does not appear to address your cURL question, @ChrisHemedinger did write a post on using PROC HTTP to scrape web pages. This might get you going in the right direction with PROC HTTP.
I saw this post before, and while his examples of how to parse the resulting data are useful, @ChrisHemedinger's PROC HTTP calls use the GET method, which is more intuitive and I wish I could apply it here.
Thanks.
I have several examples of PROC HTTP that include POST calls. See the whole collection here (you'll have to sift through for POST examples).
Your cURL example didn't seem to have a POST in it, but you can get your params/data passed with the IN= option.
filename resp temp;
proc http
 method="POST"
 in='PARAM1=value1&PARAM2=value2&PARAM3=&PARAM4=value4'
 url="https://website.net"
 out=resp;
run;Seems like your site might require a login or maybe relies on cookies to track a session -- PROC HTTP has options for that too.
@ChrisHemedinger, it turns out the piece I was missing was the CT specificiation.
Here are the steps I took:
1. Went to the website of interest in Chrome
2. Entered the relevant info to return the results I wanted to capture
3. In the Chrome developer tools, I copied the network initiator for default.aspx (under the network tab) and copied it as cURL
4. The cURL has the following format (which I outlined in the original post)
curl '[url]' -H 'Connection: keep-alive' (other header stuff) -H 'Content-Type: application/x-www-form-urlencoded' --data (list of parameters, separated by &, the whole string in quotes)
The list of values in the data stream was too long to copy into my EG code node, so I saved it to a flat file and read it in that way.
Then I used PROC HTTP.
proc http 
   url="[url]"
   ct="application/x-www-form-urlencoded"
   out=resp ;
run ;
The part I hadn't been able to connect was that I needed to supply the information in the "content type" part of the header as the CT value in the PROC HTTP. Otherwise, the parameter specs were the same.
I tried many variations before this worked, and it seemed that without the CT value of "application/x-www-form-urlencoded," the results I was getting were just for the launch site, and not the actual data i was after.
The documentation is clear in that it says "Specify the HTTP content-type to be set in the request headers." for the CT parameter in PROC HTTP. The simple POST request example shows that also: http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a003286808.htm as well.
It was actually reading up on POST vs GET here: https://www.w3schools.com/tags/ref_httpmethods.asp and more about POST here:
https://developer.mozilla.org/en-US/docs/Web/HTTP/Methods/POST that helped connect the dots as well.
There was one post on the forums (that you also referenced in a subsequent blog post) that I am having trouble putting my finger on at the moment, but you looped in the PROC HTTP developer, who I believe posted something about including "boundary" definitions in the code. That made more sense after reading the mozilla dev link.
Anyway, I hope someone else finds this all useful. Onto parsing the data....
@jteres One more note for you. Up until SAS 9.4 Maint 2, PROC HTTP default method was POST. Later, it was changed to GET -- appropriate for most operations. If you're running in SAS 9.4m3 or higher, then your code is actually using GET.
If you're really needing a POST operation, best practice is to include the method="POST" option. I always provide the method= option, just so it's obvious to me later...
Chris
Have you read these?
https://blogs.sas.com/content/sastraining/2019/02/05/webpage-scraping-made-easy-with-proc-http/
https://blogs.sas.com/content/sasdummy/2017/12/04/scrape-web-page-data/
Scraping Dakota Inmate Data
https://support.sas.com/resources/papers/proceedings11/140-2011.pdf
Worse case scenario use CURL in your OS to pipe to a txt file and use SAS to read that and pass the command to the OS.
Illustrated here (note 2012 so definitely out of date)
https://support.sas.com/resources/papers/proceedings12/121-2012.pdf
@jteres wrote:
Can someone please explain how to map the information from a cURL into something usable in PROC HTTP?
I went to a website (sorry for being vague) and manually selected values and used the Chrome developer tools to capture the cURL.
All the cURL values have this general form-- a url followed by several header paramters, followed by some parmeters in a --data list, followed by --compressed:
curl 'url' [http://www.website.net] [header parameters e.g. -H 'Connection: keep-alive' -H 'Cache-Control: max-age=0' etc etc -- data 'PARAM1=value1&PARAM2=value2&PARAM3=&PARAM4=value4' --compressed
I've played around a bit with the cURL in a Mac OS terminal window, so I know that it's valid and returns what I'm looking for.
I can't find anything that lays out how to use this information in PROC HTTP, though. Most examples online either use the "GET" method, or are referencing JSON, or are using curl in an X command in SAS. The documentation shows a very bare bones POST example, as well as examples using a proxy server, but nothing has really addressed what I'm looking for.
I tried using FILE and PUT like this:
data _null_; file test ; if _n_ eq 1 then do ; put 'param1=value1' @; put 'param2=value2' @ ; put 'param3=value3' @ ; /* etc */ end ; run ;and then:
proc http url="https://www.website.net/default.aspx" in='test' out=siteout ; run;But that just gives me back the URL referenced in PROC HTTP, and not the site that would be created by submitting the parameters specified (which I got from the cURL).
Maybe I'm missing something really fundamental and basic, but I've read everything I could find, and nothing seems to relate to this directly. Also, lots of the examples leap quickly into being inside a macro loop or something, so it's hard to figure out the basics in the midst of the more advanced coding context of macros. Don't get me wrong, I know macros, but I'm just trying to work out the basics of PROC HTTP before I turn to Python for the scraping.
Thanks
Jed
I see now that this got posted in "New SAS User," which was a mistake.
These are just top google hits. They aren't relevant to my specific question.
The first example uses this:
proc http url="http://feeds.bbci.co.uk/news/technology/rss.xml" out=source; run;
and that's the full extent of PROC HTTP code in that guide.
The second link is the same blog post that @BillM_SAS pointed to. Again, not useful here because the method in PROC HTTP specified is GET and not POST.
The Dakota Inmate data uses the URL option in a FILENAME statement. I can't inspect further because that URL is no longer live.
Using curl to pipe a text file might very well be the best worst case option, but again, that doesn't answer my question about how to think about PROC HTTP.
I guess I would ask why you are responding to topics that you don't seem to know anything about? It's not helpful, nor is it a constructive use of the forum.
@Reeza wrote:
Have you read these?
Webpage scraping made easy with PROC HTTP
https://blogs.sas.com/content/sastraining/2019/02/05/webpage-scraping-made-easy-with-proc-http/
How to scrape data from a web page using SAS
https://blogs.sas.com/content/sasdummy/2017/12/04/scrape-web-page-data/
Scraping Dakota Inmate Data
https://support.sas.com/resources/papers/proceedings11/140-2011.pdf
Worse case scenario use CURL in your OS to pipe to a txt file and use SAS to read that and pass the command to the OS.
Illustrated here (note 2012 so definitely out of date)
https://support.sas.com/resources/papers/proceedings12/121-2012.pdf
@jteres wrote:
Can someone please explain how to map the information from a cURL into something usable in PROC HTTP?
I went to a website (sorry for being vague) and manually selected values and used the Chrome developer tools to capture the cURL.
All the cURL values have this general form-- a url followed by several header paramters, followed by some parmeters in a --data list, followed by --compressed:
curl 'url' [http://www.website.net] [header parameters e.g. -H 'Connection: keep-alive' -H 'Cache-Control: max-age=0' etc etc -- data 'PARAM1=value1&PARAM2=value2&PARAM3=&PARAM4=value4' --compressed
I've played around a bit with the cURL in a Mac OS terminal window, so I know that it's valid and returns what I'm looking for.
I can't find anything that lays out how to use this information in PROC HTTP, though. Most examples online either use the "GET" method, or are referencing JSON, or are using curl in an X command in SAS. The documentation shows a very bare bones POST example, as well as examples using a proxy server, but nothing has really addressed what I'm looking for.
I tried using FILE and PUT like this:
data _null_; file test ; if _n_ eq 1 then do ; put 'param1=value1' @; put 'param2=value2' @ ; put 'param3=value3' @ ; /* etc */ end ; run ;and then:
proc http url="https://www.website.net/default.aspx" in='test' out=siteout ; run;But that just gives me back the URL referenced in PROC HTTP, and not the site that would be created by submitting the parameters specified (which I got from the cURL).
Maybe I'm missing something really fundamental and basic, but I've read everything I could find, and nothing seems to relate to this directly. Also, lots of the examples leap quickly into being inside a macro loop or something, so it's hard to figure out the basics in the midst of the more advanced coding context of macros. Don't get me wrong, I know macros, but I'm just trying to work out the basics of PROC HTTP before I turn to Python for the scraping.
Thanks
Jed
Thanks VDD, but I'm in the mood to reply today.
@jteres wrote:
I see now that this got posted in "New SAS User," which was a mistake.
I'm sorry, my crystal ball is broken so I couldn't possibly know that today. Maybe tomorrow it'll be working.
These are just top google hits. They aren't relevant to my specific question.
No, these were curated, I did search for it and posted the few I thought were relevant to your question.
@jteres wrote:
I guess I would ask why you are responding to topics that you don't seem to know anything about? It's not helpful, nor is it a constructive use of the forum.
Because 99% of the time I can figure it out pretty quickly and I'm usually happy to work with people to help solve their issues when it's something I haven't done before. Then I learn something too. Ergo why I actually asked, "have you read these?" and did not say, "go read these".
Good Bye. I'm not following this post any longer.
@jteres wrote:
I see now that this got posted in "New SAS User," which was a mistake.
These are just top google hits. They aren't relevant to my specific question.
The first example uses this:
proc http url="http://feeds.bbci.co.uk/news/technology/rss.xml" out=source; run;and that's the full extent of PROC HTTP code in that guide.
The second link is the same blog post that @BillM_SAS pointed to. Again, not useful here because the method in PROC HTTP specified is GET and not POST.
The Dakota Inmate data uses the URL option in a FILENAME statement. I can't inspect further because that URL is no longer live.
Using curl to pipe a text file might very well be the best worst case option, but again, that doesn't answer my question about how to think about PROC HTTP.
I guess I would ask why you are responding to topics that you don't seem to know anything about? It's not helpful, nor is it a constructive use of the forum.
@Reeza wrote:
Have you read these?
Webpage scraping made easy with PROC HTTP
https://blogs.sas.com/content/sastraining/2019/02/05/webpage-scraping-made-easy-with-proc-http/
How to scrape data from a web page using SAS
https://blogs.sas.com/content/sasdummy/2017/12/04/scrape-web-page-data/
Scraping Dakota Inmate Data
https://support.sas.com/resources/papers/proceedings11/140-2011.pdf
Worse case scenario use CURL in your OS to pipe to a txt file and use SAS to read that and pass the command to the OS.
Illustrated here (note 2012 so definitely out of date)
https://support.sas.com/resources/papers/proceedings12/121-2012.pdf
@jteres wrote:
Can someone please explain how to map the information from a cURL into something usable in PROC HTTP?
I went to a website (sorry for being vague) and manually selected values and used the Chrome developer tools to capture the cURL.
All the cURL values have this general form-- a url followed by several header paramters, followed by some parmeters in a --data list, followed by --compressed:
curl 'url' [http://www.website.net] [header parameters e.g. -H 'Connection: keep-alive' -H 'Cache-Control: max-age=0' etc etc -- data 'PARAM1=value1&PARAM2=value2&PARAM3=&PARAM4=value4' --compressed
I've played around a bit with the cURL in a Mac OS terminal window, so I know that it's valid and returns what I'm looking for.
I can't find anything that lays out how to use this information in PROC HTTP, though. Most examples online either use the "GET" method, or are referencing JSON, or are using curl in an X command in SAS. The documentation shows a very bare bones POST example, as well as examples using a proxy server, but nothing has really addressed what I'm looking for.
I tried using FILE and PUT like this:
data _null_; file test ; if _n_ eq 1 then do ; put 'param1=value1' @; put 'param2=value2' @ ; put 'param3=value3' @ ; /* etc */ end ; run ;and then:
proc http url="https://www.website.net/default.aspx" in='test' out=siteout ; run;But that just gives me back the URL referenced in PROC HTTP, and not the site that would be created by submitting the parameters specified (which I got from the cURL).
Maybe I'm missing something really fundamental and basic, but I've read everything I could find, and nothing seems to relate to this directly. Also, lots of the examples leap quickly into being inside a macro loop or something, so it's hard to figure out the basics in the midst of the more advanced coding context of macros. Don't get me wrong, I know macros, but I'm just trying to work out the basics of PROC HTTP before I turn to Python for the scraping.
Thanks
Jed
To respond to your comment, @VDD, I came to the forum because people are usually helpful and nice.
A response like what I got from Reeza is not helpful. I don't know how else to say it, and I'm certainly not going to pretend it's useful when it's not. If anything, it's a further waste of my time to have to go through and explain why things aren't useful. I wrote in my initial message, "Maybe I'm missing something really fundamental and basic, but I've read everything I could find, and nothing seems to relate to this directly."
The reason I wrote that was to indicate that this is not my first stop.
I apologize if I violated the decorum of the forum. It was not my intention to violate forum decorum, but it's frustrating when I see a response to a question that I've been struggling with only to find that it doesn't actually address the issue I'm citing.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.
