<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic downloading data from a webpage in SAS Data Management</title>
    <link>https://communities.sas.com/t5/SAS-Data-Management/downloading-data-from-a-webpage/m-p/249924#M6687</link>
    <description>&lt;P&gt;hi,&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;is there a way to download information froma webpage into sas?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;fo example, if I go to a youtube video:&amp;nbsp;&lt;/P&gt;
&lt;H1 class="yt watch-title-container"&gt;&lt;SPAN class="watch-title "&gt;Kitten Meets Computer&lt;/SPAN&gt;&lt;/H1&gt;
&lt;P&gt;&lt;SPAN class="watch-title "&gt;&lt;A href="https://www.youtube.com/watch?v=kQ1L39be1e0" target="_blank"&gt;https://www.youtube.com/watch?v=kQ1L39be1e0&lt;/A&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="watch-title "&gt;is it possible to download say the number of views and the number of likes at any given moment?&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="watch-title "&gt;thank you!&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Sun, 14 Feb 2016 02:05:42 GMT</pubDate>
    <dc:creator>ilikesas</dc:creator>
    <dc:date>2016-02-14T02:05:42Z</dc:date>
    <item>
      <title>downloading data from a webpage</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/downloading-data-from-a-webpage/m-p/249924#M6687</link>
      <description>&lt;P&gt;hi,&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;is there a way to download information froma webpage into sas?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;fo example, if I go to a youtube video:&amp;nbsp;&lt;/P&gt;
&lt;H1 class="yt watch-title-container"&gt;&lt;SPAN class="watch-title "&gt;Kitten Meets Computer&lt;/SPAN&gt;&lt;/H1&gt;
&lt;P&gt;&lt;SPAN class="watch-title "&gt;&lt;A href="https://www.youtube.com/watch?v=kQ1L39be1e0" target="_blank"&gt;https://www.youtube.com/watch?v=kQ1L39be1e0&lt;/A&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="watch-title "&gt;is it possible to download say the number of views and the number of likes at any given moment?&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="watch-title "&gt;thank you!&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 14 Feb 2016 02:05:42 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/downloading-data-from-a-webpage/m-p/249924#M6687</guid>
      <dc:creator>ilikesas</dc:creator>
      <dc:date>2016-02-14T02:05:42Z</dc:date>
    </item>
    <item>
      <title>Re: downloading data from a webpage</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/downloading-data-from-a-webpage/m-p/249929#M6689</link>
      <description>&lt;P&gt;HTML is text.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Go to the page in question and view the page source. See if you can find the elements in the page source. If you can find it then yes you can, and then you can determine how to parse the file to get the information of interest.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 14 Feb 2016 03:11:00 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/downloading-data-from-a-webpage/m-p/249929#M6689</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2016-02-14T03:11:00Z</dc:date>
    </item>
    <item>
      <title>Re: downloading data from a webpage</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/downloading-data-from-a-webpage/m-p/249936#M6690</link>
      <description>&lt;P&gt;Hi Reeza,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;On the web page I right click - inspect on the views count and got that element in html. But when I try to parse it in sas I have trouble. Is there some documentation on how to parse html elements?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thank you!&lt;/P&gt;</description>
      <pubDate>Sun, 14 Feb 2016 04:03:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/downloading-data-from-a-webpage/m-p/249936#M6690</guid>
      <dc:creator>ilikesas</dc:creator>
      <dc:date>2016-02-14T04:03:01Z</dc:date>
    </item>
    <item>
      <title>Re: downloading data from a webpage</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/downloading-data-from-a-webpage/m-p/249938#M6691</link>
      <description>&lt;P&gt;Not really, but as I mentioned it's a text file, you process it the same way. You can search using FIND/INDEX functions or PRX functions.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Also, the solution from Tom Kari, in your other question, had samples of how that was occurring. &amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;SAS isn't a good webscraping tool...it can be used, but there are better tools out there...Import.IO is free, either web or desktop based and has a free version. It can be scripted so that you have results run and available on a regular basis.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 14 Feb 2016 04:05:53 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/downloading-data-from-a-webpage/m-p/249938#M6691</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2016-02-14T04:05:53Z</dc:date>
    </item>
    <item>
      <title>Re: downloading data from a webpage</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/downloading-data-from-a-webpage/m-p/249950#M6692</link>
      <description>Perhaps this could be of interest? &lt;BR /&gt;&lt;BR /&gt;&lt;A href="https://developers.google.com/youtube/2.0/developers_guide_protocol_video_entries" target="_blank"&gt;https://developers.google.com/youtube/2.0/developers_guide_protocol_video_entries&lt;/A&gt;</description>
      <pubDate>Sun, 14 Feb 2016 11:02:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/downloading-data-from-a-webpage/m-p/249950#M6692</guid>
      <dc:creator>LinusH</dc:creator>
      <dc:date>2016-02-14T11:02:18Z</dc:date>
    </item>
    <item>
      <title>Re: downloading data from a webpage</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/downloading-data-from-a-webpage/m-p/249966#M6693</link>
      <description>&lt;P&gt;Hi LinusH,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;thanks for the link!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In the meanwhile I just tried to do some code which will extract the number of views of the youtube video Kitten Meets Computer:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;data&lt;/STRONG&gt; kitten;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; length view $&lt;STRONG&gt;32767&lt;/STRONG&gt;;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; filename the_vid url "&lt;A href="https://www.youtube.com/watch?v=kQ1L39be1e0&amp;quot;;" target="_blank"&gt;https://www.youtube.com/watch?v=kQ1L39be1e0";&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; infile the_vid lrecl=&lt;STRONG&gt;32767&lt;/STRONG&gt;;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; input;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; view = _infile_;&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;data&lt;/STRONG&gt; kitten2;&lt;/P&gt;
&lt;P&gt;set kitten;&lt;/P&gt;
&lt;P&gt;IF&lt;/P&gt;
&lt;P&gt;(INDEX(view,'watch-view-count') = &lt;STRONG&gt;0&lt;/STRONG&gt;)&amp;nbsp; THEN DELETE;&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;data&lt;/STRONG&gt; kitten3;&lt;/P&gt;
&lt;P&gt;set kitten2;&lt;/P&gt;
&lt;P&gt;place=index(view,'watch-view-count');&lt;/P&gt;
&lt;P&gt;like=substr(view,place+&lt;STRONG&gt;18&lt;/STRONG&gt;,&lt;STRONG&gt;5&lt;/STRONG&gt;);&lt;/P&gt;
&lt;P&gt;drop place;&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Just in the kitten3 I should do the part "&lt;SPAN&gt;like=substr(view,place+&lt;/SPAN&gt;&lt;STRONG&gt;18&lt;/STRONG&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;STRONG&gt;5&lt;/STRONG&gt;&lt;SPAN&gt;);" &amp;nbsp;dynamic because once the number of views reaches 10,000 the 5 will become 6 etc.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 14 Feb 2016 15:39:28 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/downloading-data-from-a-webpage/m-p/249966#M6693</guid>
      <dc:creator>ilikesas</dc:creator>
      <dc:date>2016-02-14T15:39:28Z</dc:date>
    </item>
    <item>
      <title>Re: downloading data from a webpage</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/downloading-data-from-a-webpage/m-p/249974#M6695</link>
      <description>Instead of substr look at other functions such as scan. Also, how did you calculate 18? If you used index or find you can dynamically calculate the amount.</description>
      <pubDate>Sun, 14 Feb 2016 17:16:26 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/downloading-data-from-a-webpage/m-p/249974#M6695</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2016-02-14T17:16:26Z</dc:date>
    </item>
    <item>
      <title>Re: downloading data from a webpage</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/downloading-data-from-a-webpage/m-p/249978#M6696</link>
      <description>&lt;P&gt;Also, post a screenshot or sample of the text you're trying to parse.&lt;/P&gt;
&lt;P&gt;It's hard to make suggestions otherwise.&lt;/P&gt;</description>
      <pubDate>Sun, 14 Feb 2016 18:03:56 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/downloading-data-from-a-webpage/m-p/249978#M6696</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2016-02-14T18:03:56Z</dc:date>
    </item>
    <item>
      <title>Re: downloading data from a webpage</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/downloading-data-from-a-webpage/m-p/249983#M6698</link>
      <description>&lt;P&gt;hi,&lt;/P&gt;
&lt;P&gt;I actually counted the number of spaces, I know, not the most efficient way...&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I tried meanwhile scraping other web pages. I tried to scrap the friends that liked a post that I uploaded on my faceboof, but it didn't work --&amp;gt; sas didn't download any of the html elements associated with the opening sliding window with the people who liked.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I also saw an example from the paper that you suggested that scraps job postings from the Edmonton career website. There the actual scraping is done with Perl, and the parsing with SAS, so I guess that for more complicated wed scraping SAS is not the best program.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I also saw import.io and tried it online and it seemed to be a bit too limited because it chooses itself what to scrap, and sometimes it didn't scrap anything.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thank you!&lt;/P&gt;</description>
      <pubDate>Sun, 14 Feb 2016 18:45:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/downloading-data-from-a-webpage/m-p/249983#M6698</guid>
      <dc:creator>ilikesas</dc:creator>
      <dc:date>2016-02-14T18:45:32Z</dc:date>
    </item>
    <item>
      <title>Re: downloading data from a webpage</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/downloading-data-from-a-webpage/m-p/249985#M6699</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/12982"&gt;@ilikesas&lt;/a&gt; wrote:&lt;BR /&gt;
&lt;P&gt;hi,&lt;/P&gt;
&lt;P&gt;I actually counted the number of spaces, I know, not the most efficient way...&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I tried meanwhile scraping other web pages. I tried to scrap the friends that liked a post that I uploaded on my faceboof, but it didn't work --&amp;gt; sas didn't download any of the html elements associated with the opening sliding window with the&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I also saw import.io and tried it online and it seemed to be a bit too limited because it chooses itself what to scrap, and sometimes it didn't scrap anything.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thank you!&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;1. Yes, change that to a scan or find and you'll probably be able to make it dynamic&lt;/P&gt;
&lt;P&gt;2. For FB you definitely need to go through the API, same with Twitter. Here's a SAS paper that tried it years ago, and most likely won't work today :&amp;nbsp;&lt;A href="http://www.sascommunity.org/wiki/Social_Networking_and_SAS:_Running_PROCs_on_Your_Facebook_Friends" target="_blank"&gt;http://www.sascommunity.org/wiki/Social_Networking_and_SAS:_Running_PROCs_on_Your_Facebook_Friends&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;3. Import.Io -&amp;gt; the user can select the fields to parse. In general download app is better than the web one.&lt;/P&gt;</description>
      <pubDate>Sun, 14 Feb 2016 18:52:02 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/downloading-data-from-a-webpage/m-p/249985#M6699</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2016-02-14T18:52:02Z</dc:date>
    </item>
    <item>
      <title>Re: downloading data from a webpage</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/downloading-data-from-a-webpage/m-p/250468#M6717</link>
      <description>&lt;P&gt;This is a really nice application for regular expressions. Try this code instead of your data steps for kitten2 and kitten3:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;data kitten2;&lt;BR /&gt;retain re;&lt;BR /&gt;if _n_ = 1&lt;BR /&gt;then re=prxparse('#(&amp;lt;div class="watch-view-count"&amp;gt;)([\d,]+)(&amp;lt;/div&amp;gt;)#');&lt;BR /&gt;set kitten;&lt;BR /&gt;if prxmatch(re, view)&lt;BR /&gt;then do;&lt;BR /&gt;&amp;nbsp; view_count = input(prxposn(re, 2, view), comma20.);&lt;BR /&gt;&amp;nbsp; output;&lt;BR /&gt;end;&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The regular expression (assignment to re) has three parts, each in a set of parentheses:&lt;/P&gt;&lt;P&gt;First is the literal "&amp;lt;div class="watch-view-count"&amp;gt;", and third is the literal "&amp;lt;/div&amp;gt;"&lt;/P&gt;&lt;P&gt;Second, represented by "[\d,]+", looks for one or more repetitions of a digit or a blank.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;PRXPOSN returns the string found as the second part, as a character variable, which is converted to a number by the INPUT function.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Tom&lt;/P&gt;</description>
      <pubDate>Tue, 16 Feb 2016 22:50:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/downloading-data-from-a-webpage/m-p/250468#M6717</guid>
      <dc:creator>TomKari</dc:creator>
      <dc:date>2016-02-16T22:50:39Z</dc:date>
    </item>
  </channel>
</rss>

