<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: web crawler macro in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/web-crawler-macro/m-p/137655#M27829</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi:&lt;/P&gt;&lt;P&gt;&amp;nbsp; Is this homework? I notice reference to this school bschool.nus.edu.sg&amp;nbsp; in the program. If this is homework, then perhaps you should ask your professor about the reason the program is not working and/or the correct SAS function to use and/or about looping constructs with SAS programs.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;P&gt;&amp;nbsp; The documentation for the INDEX function is fairly clear that it only finds the FIRST occurrence of a string, which you can verify by looking at the documentation (highlighted sentence is mine):&lt;/P&gt;&lt;P&gt;From the documentation &lt;A href="http://support.sas.com/documentation/cdl/en/lefunctionsref/67239/HTML/default/viewer.htm#n0vxokxhv8lr84n10nrbnzp7gnba.htm" title="http://support.sas.com/documentation/cdl/en/lefunctionsref/67239/HTML/default/viewer.htm#n0vxokxhv8lr84n10nrbnzp7gnba.htm"&gt;SAS(R) 9.4 Functions and CALL Routines: Reference, Second Edition&lt;/A&gt;&lt;/P&gt;&lt;H3 class="xis-title"&gt;The Basics&lt;/H3&gt;&lt;DIV class="xis-topicContent"&gt;&lt;A id="n19h50021nlsddn1ovzu16dueaqs"&gt;&lt;/A&gt;&lt;DIV class="xis-paragraph"&gt;&lt;A id="n0ty5ohbdajuzyn15l5z9wp7w2vv"&gt;&lt;/A&gt;The INDEX function searches &lt;SPAN class="xis-userSuppliedValue"&gt;source&lt;/SPAN&gt;, from left to right, for the first occurrence of the string specified in &lt;SPAN class="xis-userSuppliedValue"&gt;excerpt&lt;/SPAN&gt;, and returns the&amp;nbsp; position in &lt;SPAN class="xis-userSuppliedValue"&gt;source&lt;/SPAN&gt; of the string's first character.&amp;nbsp; If the string is not found in &lt;SPAN class="xis-userSuppliedValue"&gt;source&lt;/SPAN&gt;, INDEX returns a value of 0.&lt;SPAN style="text-decoration: underline;"&gt;&lt;STRONG&gt; If there are multiple occurrences of the string, INDEX returns only the position of the first occurrence. &lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;P&gt;&amp;nbsp; What is returned from the INDEX function is the POSITION of the string's first character in the variable you have searched. So, the INDEX function might or might not be the appropriate function for you to use. My suggestion is that instead of trying to make the web crawler program work, you use a simpler program and try to modify the program to correctly locate the word DERIVATIVE and/or the word THE in the following 4 sentences. Once you discover the correct function and/or looping technique to correctly find more than one occurrence of the string in a variable, then you will have found the correct techniques to modify your web crawler program.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;P&gt;Cynthia&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&lt;STRONG&gt;** note how INDEX only returns the position of the FIRST occurence;&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&lt;STRONG&gt;** of the search string;&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&lt;STRONG&gt;data testit;&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&lt;STRONG&gt;&amp;nbsp; length line $100;&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&lt;STRONG&gt;&amp;nbsp; infile datalines dsd dlm=',';&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&lt;STRONG&gt;&amp;nbsp; input lnum line $;&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&lt;STRONG&gt;&amp;nbsp; isfound_deriv = index(upcase(line),'DERIVATIVE');&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&lt;STRONG&gt;&amp;nbsp; isfound_the = index(upcase(line),'THE');&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&lt;STRONG&gt;return;&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&lt;STRONG&gt;datalines;&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&lt;STRONG&gt;1,"Twas brillig and the slithy toves"&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&lt;STRONG&gt;2,"DERIVATIVE of the XYZ Corp and derivative of the ABC Corp too"&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&lt;STRONG&gt;3,"Away along the riverrun past Eve and Adam's"&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&lt;STRONG&gt;4,"Something with derivative in the sentence only once"&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&lt;STRONG&gt;;&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&lt;STRONG&gt;run;&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&lt;STRONG&gt;ods listing;&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&lt;STRONG&gt;proc print data=testit;&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&lt;STRONG&gt;run;&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Sun, 09 Mar 2014 18:50:17 GMT</pubDate>
    <dc:creator>Cynthia_sas</dc:creator>
    <dc:date>2014-03-09T18:50:17Z</dc:date>
    <item>
      <title>web crawler macro</title>
      <link>https://communities.sas.com/t5/SAS-Programming/web-crawler-macro/m-p/137654#M27828</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; I am using a web crawler program to find some specific keywords ("futures", "forwards", "notional" etc.) in the 10k reports from the sec edgar database. Once the code finds the keyword, I am printing 5 (or 10) lines around the keyword to get the derivative values.&lt;/P&gt;&lt;P&gt;The code is working, and it is fetching data but not all the required data. What the current code is doing is looking at the keywords just once and then returning lines surrounding that. For e.g. if there are 4 or 5 instances of "Notional" in the 10k, it is just looking at the first notional keyword in 10k, and returning lines surrounding that. Then it is looking at the next keyword and next.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Rather than looking at all the instances of keywords, it is just looking at the first one it finds and moving on to the next one. I hope you understand the problem.&lt;/P&gt;&lt;P&gt;I have attached the sas code with the mail. Can anyone help me with the issue?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Sonik Mandal&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Sun, 09 Mar 2014 17:11:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/web-crawler-macro/m-p/137654#M27828</guid>
      <dc:creator>sonikm24</dc:creator>
      <dc:date>2014-03-09T17:11:50Z</dc:date>
    </item>
    <item>
      <title>Re: web crawler macro</title>
      <link>https://communities.sas.com/t5/SAS-Programming/web-crawler-macro/m-p/137655#M27829</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi:&lt;/P&gt;&lt;P&gt;&amp;nbsp; Is this homework? I notice reference to this school bschool.nus.edu.sg&amp;nbsp; in the program. If this is homework, then perhaps you should ask your professor about the reason the program is not working and/or the correct SAS function to use and/or about looping constructs with SAS programs.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;P&gt;&amp;nbsp; The documentation for the INDEX function is fairly clear that it only finds the FIRST occurrence of a string, which you can verify by looking at the documentation (highlighted sentence is mine):&lt;/P&gt;&lt;P&gt;From the documentation &lt;A href="http://support.sas.com/documentation/cdl/en/lefunctionsref/67239/HTML/default/viewer.htm#n0vxokxhv8lr84n10nrbnzp7gnba.htm" title="http://support.sas.com/documentation/cdl/en/lefunctionsref/67239/HTML/default/viewer.htm#n0vxokxhv8lr84n10nrbnzp7gnba.htm"&gt;SAS(R) 9.4 Functions and CALL Routines: Reference, Second Edition&lt;/A&gt;&lt;/P&gt;&lt;H3 class="xis-title"&gt;The Basics&lt;/H3&gt;&lt;DIV class="xis-topicContent"&gt;&lt;A id="n19h50021nlsddn1ovzu16dueaqs"&gt;&lt;/A&gt;&lt;DIV class="xis-paragraph"&gt;&lt;A id="n0ty5ohbdajuzyn15l5z9wp7w2vv"&gt;&lt;/A&gt;The INDEX function searches &lt;SPAN class="xis-userSuppliedValue"&gt;source&lt;/SPAN&gt;, from left to right, for the first occurrence of the string specified in &lt;SPAN class="xis-userSuppliedValue"&gt;excerpt&lt;/SPAN&gt;, and returns the&amp;nbsp; position in &lt;SPAN class="xis-userSuppliedValue"&gt;source&lt;/SPAN&gt; of the string's first character.&amp;nbsp; If the string is not found in &lt;SPAN class="xis-userSuppliedValue"&gt;source&lt;/SPAN&gt;, INDEX returns a value of 0.&lt;SPAN style="text-decoration: underline;"&gt;&lt;STRONG&gt; If there are multiple occurrences of the string, INDEX returns only the position of the first occurrence. &lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;P&gt;&amp;nbsp; What is returned from the INDEX function is the POSITION of the string's first character in the variable you have searched. So, the INDEX function might or might not be the appropriate function for you to use. My suggestion is that instead of trying to make the web crawler program work, you use a simpler program and try to modify the program to correctly locate the word DERIVATIVE and/or the word THE in the following 4 sentences. Once you discover the correct function and/or looping technique to correctly find more than one occurrence of the string in a variable, then you will have found the correct techniques to modify your web crawler program.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;P&gt;Cynthia&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&lt;STRONG&gt;** note how INDEX only returns the position of the FIRST occurence;&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&lt;STRONG&gt;** of the search string;&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&lt;STRONG&gt;data testit;&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&lt;STRONG&gt;&amp;nbsp; length line $100;&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&lt;STRONG&gt;&amp;nbsp; infile datalines dsd dlm=',';&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&lt;STRONG&gt;&amp;nbsp; input lnum line $;&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&lt;STRONG&gt;&amp;nbsp; isfound_deriv = index(upcase(line),'DERIVATIVE');&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&lt;STRONG&gt;&amp;nbsp; isfound_the = index(upcase(line),'THE');&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&lt;STRONG&gt;return;&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&lt;STRONG&gt;datalines;&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&lt;STRONG&gt;1,"Twas brillig and the slithy toves"&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&lt;STRONG&gt;2,"DERIVATIVE of the XYZ Corp and derivative of the ABC Corp too"&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&lt;STRONG&gt;3,"Away along the riverrun past Eve and Adam's"&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&lt;STRONG&gt;4,"Something with derivative in the sentence only once"&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&lt;STRONG&gt;;&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&lt;STRONG&gt;run;&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&lt;STRONG&gt;ods listing;&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&lt;STRONG&gt;proc print data=testit;&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&lt;STRONG&gt;run;&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Sun, 09 Mar 2014 18:50:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/web-crawler-macro/m-p/137655#M27829</guid>
      <dc:creator>Cynthia_sas</dc:creator>
      <dc:date>2014-03-09T18:50:17Z</dc:date>
    </item>
    <item>
      <title>Re: web crawler macro</title>
      <link>https://communities.sas.com/t5/SAS-Programming/web-crawler-macro/m-p/137656#M27830</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hello Cynthia,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Thanks for the reply and also for sending the example code. No, this program, I am using for data collection in my thesis work. This code is taken originally from a paper and one of the authors is from the school mentioned in the code.&lt;/P&gt;&lt;P&gt;I am using the prxnext() function instead of the index function now. Trying to integrate that into the macro. If i face any problems, I will let you know in the forum.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Sonik Mandal&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 10 Mar 2014 02:30:56 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/web-crawler-macro/m-p/137656#M27830</guid>
      <dc:creator>sonikm24</dc:creator>
      <dc:date>2014-03-10T02:30:56Z</dc:date>
    </item>
    <item>
      <title>Re: web crawler macro</title>
      <link>https://communities.sas.com/t5/SAS-Programming/web-crawler-macro/m-p/137657#M27831</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hello Cynthia,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; I have used a different function to find multiple instances of keywords (see the code attached). But I am having a problem when I am trying to output lines surrounding the keywords. I am trying to increase the output lines for every instance of the keyword in the sec file. E.g. if there are 5 instances of "Notional" in the sec file, i am trying to output lines surrounding each one of the instances of the keyword. In the code, I am using the following lines of code for that purpose:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if (0 &amp;lt; countC2 &amp;lt;= 10) then do;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; output;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;/P&gt;&lt;P&gt;But this code is not able to increase or decrease the output lines surrounding the keywords even by changing 10 to 15 or 5. Please let me know the problem in the code. I have attached the code and a sample excel file.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks.&lt;/P&gt;&lt;P&gt;Sonik Mandal&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 03 Apr 2014 00:10:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/web-crawler-macro/m-p/137657#M27831</guid>
      <dc:creator>sonikm24</dc:creator>
      <dc:date>2014-04-03T00:10:36Z</dc:date>
    </item>
  </channel>
</rss>

