<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: searching the internet in SAS Data Management</title>
    <link>https://communities.sas.com/t5/SAS-Data-Management/searching-the-internet/m-p/342997#M10214</link>
    <description>&lt;P&gt;Has there any update on this?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;We are currently using the filename syntax to pull driving directions for a series of address sets, subject to daily limitations imposed by google. &amp;nbsp;There are various sites, sugi papers, etc... that detail this methodology.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;We are looking to utilize this funtionality to retrieve the results of the 1st page of google results for some &amp;amp;x &amp;amp;y &amp;amp;z&amp;nbsp;query combination. &amp;nbsp;I landed upon this page and I'm seeing great information on this. &amp;nbsp;This was very helpful!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 21 Mar 2017 17:00:54 GMT</pubDate>
    <dc:creator>Data_Detective_23219</dc:creator>
    <dc:date>2017-03-21T17:00:54Z</dc:date>
    <item>
      <title>searching the internet</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/searching-the-internet/m-p/249852#M6671</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;is it possible to use SAS to search the internet?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;suppose I want to google "used cars", is it possible to get say the first 100 links into a sas file?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thank you!&lt;/P&gt;</description>
      <pubDate>Sat, 13 Feb 2016 06:05:47 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/searching-the-internet/m-p/249852#M6671</guid>
      <dc:creator>ilikesas</dc:creator>
      <dc:date>2016-02-13T06:05:47Z</dc:date>
    </item>
    <item>
      <title>Re: searching the internet</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/searching-the-internet/m-p/249853#M6672</link>
      <description>&lt;P&gt;The term is web scraping.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you search on lexjansen.com there are a bunch of papers with sample code.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Here's an example:&lt;/P&gt;
&lt;P&gt;&lt;A href="http://support.sas.com/resources/papers/proceedings12/121-2012.pdf" target="_blank"&gt;http://support.sas.com/resources/papers/proceedings12/121-2012.pdf&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;P&gt;You may also want to look into if there's an API a which will allow you to send a request and get a JSON dataset in return that's in a more structured format.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 13 Feb 2016 06:13:42 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/searching-the-internet/m-p/249853#M6672</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2016-02-13T06:13:42Z</dc:date>
    </item>
    <item>
      <title>Re: searching the internet</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/searching-the-internet/m-p/249876#M6674</link>
      <description>&lt;P&gt;Here's another presentation that describes how to do exactly what you're discussing.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;A href="http://www.oasus.ca/OASUS_20130612_files/3_Scraping_the_Web_with_SAS/3_Scraping_the_Web_with_SAS.ppsx" target="_blank"&gt;www.oasus.ca/OASUS_20130612_files/3_Scraping_the_Web_with_SAS/3_Scraping_the_Web_with_SAS.ppsx&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 13 Feb 2016 15:05:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/searching-the-internet/m-p/249876#M6674</guid>
      <dc:creator>TomKari</dc:creator>
      <dc:date>2016-02-13T15:05:15Z</dc:date>
    </item>
    <item>
      <title>Re: searching the internet</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/searching-the-internet/m-p/249889#M6679</link>
      <description>&lt;P&gt;Hi Tom,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I actually found your presentation and the examples at the OASUS site.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I did example 3 and obtained the distinct adresses which were found with google.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I also tried to do example 4 and get 1000 adresses, but I think that it freezes my SAS&amp;nbsp;because the data is too big, is this possible?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thank you!&lt;/P&gt;</description>
      <pubDate>Sat, 13 Feb 2016 20:22:20 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/searching-the-internet/m-p/249889#M6679</guid>
      <dc:creator>ilikesas</dc:creator>
      <dc:date>2016-02-13T20:22:20Z</dc:date>
    </item>
    <item>
      <title>Re: searching the internet</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/searching-the-internet/m-p/249892#M6680</link>
      <description>&lt;P&gt;Good stuff! I'm glad you're partway there.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I had similar things happen to me. I don't think it's a volume issue, as by SAS standards this is all fairly low volume.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I tried it again, but changed the macro loop to&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;%do&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt; i=&lt;/FONT&gt;&lt;STRONG&gt;&lt;FONT color="#008080" face="Courier New" size="3"&gt;1&lt;/FONT&gt;&lt;/STRONG&gt; &lt;FONT color="#0000ff" face="Courier New" size="3"&gt;%to&lt;/FONT&gt; &lt;STRONG&gt;&lt;FONT color="#008080" face="Courier New" size="3"&gt;5&lt;/FONT&gt;&lt;/STRONG&gt; &lt;FONT color="#0000ff" face="Courier New" size="3"&gt;%by&lt;/FONT&gt; &lt;STRONG&gt;&lt;FONT color="#008080" face="Courier New" size="3"&gt;5&lt;/FONT&gt;&lt;/STRONG&gt;&lt;FONT face="Courier New" size="3"&gt;;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;to only run the query once. It ran, but took a couple of minutes. I'm wondering if Google has added a "limiter" to slow things down, and prevent people from doing this kind of thing.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;All I can suggest using this mechanism is to be patient, and certainly don't try to do 1000 at once.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Keep in mind, it's Google's world. They only let us live in it, sigh!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Best,&lt;/P&gt;&lt;P&gt;&amp;nbsp; Tom&lt;/P&gt;&lt;P&gt;&lt;FONT face="Courier New" size="3"&gt;&amp;nbsp;&lt;/FONT&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 13 Feb 2016 21:39:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/searching-the-internet/m-p/249892#M6680</guid>
      <dc:creator>TomKari</dc:creator>
      <dc:date>2016-02-13T21:39:36Z</dc:date>
    </item>
    <item>
      <title>Re: searching the internet</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/searching-the-internet/m-p/249900#M6681</link>
      <description>&lt;P&gt;Hi Tom,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;thanks for the reply, I guess that Google is actually trying to limit such behavior, maybe its related to making their advertisements more visible...&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I would like to ask you another small quesiton if I may: I have found an example which is going to an employment website and obtaining the job postings. In this example the author uses Perl/LWP code. Can this Perl code be run on SAS, or another program is needed?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thank you!&lt;/P&gt;</description>
      <pubDate>Sat, 13 Feb 2016 23:10:14 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/searching-the-internet/m-p/249900#M6681</guid>
      <dc:creator>ilikesas</dc:creator>
      <dc:date>2016-02-13T23:10:14Z</dc:date>
    </item>
    <item>
      <title>Re: searching the internet</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/searching-the-internet/m-p/249903#M6682</link>
      <description>&lt;P&gt;No, Perl code can't be run inside of SAS. However, if the Perl code is searching or replacing using Regular Expressions, the SAS PRX routines provide much of the same functionality, with pretty much the same sytntax.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Another option, depending on your SAS environment, is to run Perl using a SAS "X" command, and then acquire the Perl output in SAS.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Tom&lt;/P&gt;</description>
      <pubDate>Sat, 13 Feb 2016 23:23:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/searching-the-internet/m-p/249903#M6682</guid>
      <dc:creator>TomKari</dc:creator>
      <dc:date>2016-02-13T23:23:48Z</dc:date>
    </item>
    <item>
      <title>Re: searching the internet</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/searching-the-internet/m-p/249904#M6683</link>
      <description>&lt;P&gt;Actually, now that I think about it, that would be pretty funny. Someone announces "a great new search engine", but all it does is pass the searches to Google, and list the results.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Sorry I didn't think of this sooner...I might have gotten a lot richer than writing SAS code!&lt;/P&gt;</description>
      <pubDate>Sat, 13 Feb 2016 23:25:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/searching-the-internet/m-p/249904#M6683</guid>
      <dc:creator>TomKari</dc:creator>
      <dc:date>2016-02-13T23:25:25Z</dc:date>
    </item>
    <item>
      <title>Re: searching the internet</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/searching-the-internet/m-p/249908#M6684</link>
      <description>&lt;P&gt;Hi Tom,&lt;/P&gt;
&lt;P&gt;In your slide in part 4 there is a code line:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;prxid=prxparse('/(?&amp;lt;=&amp;lt;h3 class="r"&amp;gt;&amp;lt;a &amp;nbsp; href="\/url\?q=)[[:alnum:]-&amp;nbsp; \._~:\/\?#\[\]@!\$''\(\)\*\+,;=]+(?=&amp;amp;amp)/o');&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;from what i understand its to find the url. By looking at the code, it serches for "a href" which is the beginning of the url, but how does SAS know where the url ends, unless here its different from a regular string and what SAS is actually doing is searching for the "url box" in the html?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;and if that is the cases, does it mean that SAS can look for all the different "boxes" of html?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;thanks!&lt;/P&gt;</description>
      <pubDate>Sun, 14 Feb 2016 00:16:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/searching-the-internet/m-p/249908#M6684</guid>
      <dc:creator>ilikesas</dc:creator>
      <dc:date>2016-02-14T00:16:21Z</dc:date>
    </item>
    <item>
      <title>Re: searching the internet</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/searching-the-internet/m-p/249910#M6685</link>
      <description>&lt;P&gt;They provide an interface for valid users to scrape via the APIs.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;This prevents things like&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/15142"&gt;@TomKari﻿&lt;/a&gt;&amp;nbsp;idea.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 14 Feb 2016 00:30:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/searching-the-internet/m-p/249910#M6685</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2016-02-14T00:30:57Z</dc:date>
    </item>
    <item>
      <title>Re: searching the internet</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/searching-the-internet/m-p/249917#M6686</link>
      <description>&lt;P&gt;Thanks, &lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13879"&gt;@Reeza&lt;/a&gt;!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Can you point at some documentation about this? I looked for it when I did this, a few years ago, but didn't find anything.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Tom&lt;/P&gt;</description>
      <pubDate>Sun, 14 Feb 2016 01:04:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/searching-the-internet/m-p/249917#M6686</guid>
      <dc:creator>TomKari</dc:creator>
      <dc:date>2016-02-14T01:04:31Z</dc:date>
    </item>
    <item>
      <title>Re: searching the internet</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/searching-the-internet/m-p/249927#M6688</link>
      <description>&lt;P&gt;Regular Expressions are a complex subject. Here's a slide from my presentation, that attempts to describe how the regular expression is composed.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;IMG title="PRX.jpg" alt="PRX.jpg" src="https://communities.sas.com/t5/image/serverpage/image-id/1885i97C1C636FF27B3CA/image-size/original?v=mpbl-1&amp;amp;px=-1" border="0" /&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 14 Feb 2016 02:19:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/searching-the-internet/m-p/249927#M6688</guid>
      <dc:creator>TomKari</dc:creator>
      <dc:date>2016-02-14T02:19:31Z</dc:date>
    </item>
    <item>
      <title>Re: searching the internet</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/searching-the-internet/m-p/249979#M6697</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/15142"&gt;@TomKari﻿&lt;/a&gt;&amp;nbsp;It appears here, under the assumption that you're adding a Search window to your website. It's old...things changed from the last time I attempted this &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;There's a very short section on Keeping a Search Result&lt;/P&gt;
&lt;P&gt;but it looks deprecated now &lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&lt;A href="https://developers.google.com/web-search/docs/" target="_blank"&gt;https://developers.google.com/web-search/docs/&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The new API looks to return XML that also could be parsed.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The old API doesn't indicate this is against usage but I'm not sure about the current one.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 14 Feb 2016 18:13:28 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/searching-the-internet/m-p/249979#M6697</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2016-02-14T18:13:28Z</dc:date>
    </item>
    <item>
      <title>Re: searching the internet</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/searching-the-internet/m-p/250227#M6712</link>
      <description>&lt;P&gt;Thanks, Reeza&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I did a little digging...the results are fascinating.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Turns out:&lt;/P&gt;&lt;P&gt;1) Google won't permit you to do the kind of thing that I described in my presentation, starting in around 2013.&lt;/P&gt;&lt;P&gt;2) Google had provided an alternative option, that was described in the link you passed on&lt;/P&gt;&lt;P&gt;3) But then they deprecated it, and replaced it with an option that you need to pay for.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Big surprise!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; Tom&lt;/P&gt;</description>
      <pubDate>Tue, 16 Feb 2016 02:12:14 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/searching-the-internet/m-p/250227#M6712</guid>
      <dc:creator>TomKari</dc:creator>
      <dc:date>2016-02-16T02:12:14Z</dc:date>
    </item>
    <item>
      <title>Re: searching the internet</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/searching-the-internet/m-p/342997#M10214</link>
      <description>&lt;P&gt;Has there any update on this?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;We are currently using the filename syntax to pull driving directions for a series of address sets, subject to daily limitations imposed by google. &amp;nbsp;There are various sites, sugi papers, etc... that detail this methodology.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;We are looking to utilize this funtionality to retrieve the results of the 1st page of google results for some &amp;amp;x &amp;amp;y &amp;amp;z&amp;nbsp;query combination. &amp;nbsp;I landed upon this page and I'm seeing great information on this. &amp;nbsp;This was very helpful!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 21 Mar 2017 17:00:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/searching-the-internet/m-p/342997#M10214</guid>
      <dc:creator>Data_Detective_23219</dc:creator>
      <dc:date>2017-03-21T17:00:54Z</dc:date>
    </item>
    <item>
      <title>Re: searching the internet</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/searching-the-internet/m-p/343001#M10215</link>
      <description>&lt;P&gt;It looks like the html info is very related to my query but my best guess as to what im seeing are the AD results and not the actual results.&lt;/P&gt;</description>
      <pubDate>Tue, 21 Mar 2017 17:21:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/searching-the-internet/m-p/343001#M10215</guid>
      <dc:creator>Data_Detective_23219</dc:creator>
      <dc:date>2017-03-21T17:21:48Z</dc:date>
    </item>
    <item>
      <title>Re: searching the internet</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/searching-the-internet/m-p/343004#M10216</link>
      <description>&lt;P&gt;I haven't needed to do anything with this since my last post...sorry.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Let us know what you find out!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Tom&lt;/P&gt;</description>
      <pubDate>Tue, 21 Mar 2017 17:31:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/searching-the-internet/m-p/343004#M10216</guid>
      <dc:creator>TomKari</dc:creator>
      <dc:date>2017-03-21T17:31:38Z</dc:date>
    </item>
  </channel>
</rss>

