<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Big Data - Module 2 Chapter 4 - Pig Latin in SAS Academy for Data Science</title>
    <link>https://communities.sas.com/t5/SAS-Academy-for-Data-Science/Big-Data-Module-2-Chapter-4-Pig-Latin/m-p/591036#M465</link>
    <description>Excellent answer.&lt;BR /&gt;&lt;BR /&gt;Thanks very much.&lt;BR /&gt;Odesh.</description>
    <pubDate>Mon, 23 Sep 2019 20:07:04 GMT</pubDate>
    <dc:creator>odesh</dc:creator>
    <dc:date>2019-09-23T20:07:04Z</dc:date>
    <item>
      <title>Big Data - Module 2 Chapter 4 - Pig Latin</title>
      <link>https://communities.sas.com/t5/SAS-Academy-for-Data-Science/Big-Data-Module-2-Chapter-4-Pig-Latin/m-p/590654#M456</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;Please refer to the attachment. This is part of the solution of the last exercise in Chapter 4. There are 5 lines of Pig Latin code and I am not sure that I am understanding the logic correctly and completely. I am writing down what I think is being done . Please tell me where I am correct and where I am not.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;line 1: a = " loads mobydick text file into DIHPS folder. ( in Hive ? )&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;line 2: b = "For each row in "a" above ( that is in the mobydick text file ) put words separately on each consecutive physical line.&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Why do we need a flatten statement here ? Would the TOKENIZE statement not be enough ?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;line 3: c = "Are we grouping each row in Step a by "word". Can you please give an example here to clarify ?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;line 4: d = "Are we counting the number of occurrences of a given word. For example a word like "whale" might have been used about&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 4000 times in the book.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;line 5: We store a table with 2 columns in a table called pig_wordcount in the DIHPS&amp;nbsp; back on the SAS server.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks.&lt;/P&gt;&lt;P&gt;Odesh.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 21 Sep 2019 20:46:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Academy-for-Data-Science/Big-Data-Module-2-Chapter-4-Pig-Latin/m-p/590654#M456</guid>
      <dc:creator>odesh</dc:creator>
      <dc:date>2019-09-21T20:46:27Z</dc:date>
    </item>
    <item>
      <title>Re: Big Data - Module 2 Chapter 4 - Pig Latin</title>
      <link>https://communities.sas.com/t5/SAS-Academy-for-Data-Science/Big-Data-Module-2-Chapter-4-Pig-Latin/m-p/590981#M461</link>
      <description>&lt;P&gt;Hi:&lt;/P&gt;
&lt;P&gt;Here's feedback from the instructors:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;line 1: a = " loads mobydick text file into DIHPS folder. ( in Hive ? )&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#FF0000"&gt;– the Pig script load operator loads the data directly from HDFS. Pig has no interaction with Hive.&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;line 2: b = "For each row in "a" above ( that is in the mobydick text file ) put words separately on each consecutive physical line.&lt;/P&gt;
&lt;P&gt;Why do we need a flatten statement here ? Would the TOKENIZE statement not be enough ?&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#FF0000"&gt;– the Flatten statement is required. Tokenize would not be enough. The Flatten statement is needed to take the tuples that are on the same line and put each tuple on a separate line. Once each line represents a word we can then group on the words as we wish to in order to get a word count.&lt;BR /&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#FF0000"&gt;Here's a way to create 2 different variables and see the impact of using FLATTEN:&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#FF0000"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="tokenize_flatten_compare.png" style="width: 600px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/32681i117C45941D01EF2E/image-size/large?v=v2&amp;amp;px=999" role="button" title="tokenize_flatten_compare.png" alt="tokenize_flatten_compare.png" /&gt;&lt;/span&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;line 3: c = "Are we grouping each row in Step a by "word". Can you please give an example here to clarify ?&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#FF0000"&gt;– yes, it’s very similar to a SQL GROUP BY statement.&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;line 4: d = "Are we counting the number of occurrences of a given word. For example a word like "whale" might have been used about 4000 times in the book.&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#FF0000"&gt;– yes.&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;line 5: We store a table with 2 columns in a table called pig_wordcount in the DIHPS back on the SAS server.&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#FF0000"&gt;– yes. However, this table is a small subset of the original file and temp tables used to perform the word count. When possible, try to keep the results on the Hadoop cluster, unless absolutely necessary for additional SAS processing.&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#000000"&gt;Hope this helps,&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#000000"&gt;Cynthia&lt;/FONT&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 23 Sep 2019 15:52:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Academy-for-Data-Science/Big-Data-Module-2-Chapter-4-Pig-Latin/m-p/590981#M461</guid>
      <dc:creator>Cynthia_sas</dc:creator>
      <dc:date>2019-09-23T15:52:39Z</dc:date>
    </item>
    <item>
      <title>Re: Big Data - Module 2 Chapter 4 - Pig Latin</title>
      <link>https://communities.sas.com/t5/SAS-Academy-for-Data-Science/Big-Data-Module-2-Chapter-4-Pig-Latin/m-p/591002#M462</link>
      <description>Very helpful but one question ..&lt;BR /&gt;&lt;BR /&gt;After the line " Here's a way to create 2 different variables and see the&lt;BR /&gt;impact of using FLATTEN: "&lt;BR /&gt;&lt;BR /&gt;Was there some additional information that going to be presented at that&lt;BR /&gt;point ?&lt;BR /&gt;&lt;BR /&gt;Odesh.&lt;BR /&gt;</description>
      <pubDate>Mon, 23 Sep 2019 17:27:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Academy-for-Data-Science/Big-Data-Module-2-Chapter-4-Pig-Latin/m-p/591002#M462</guid>
      <dc:creator>odesh</dc:creator>
      <dc:date>2019-09-23T17:27:32Z</dc:date>
    </item>
    <item>
      <title>Re: Big Data - Module 2 Chapter 4 - Pig Latin</title>
      <link>https://communities.sas.com/t5/SAS-Academy-for-Data-Science/Big-Data-Module-2-Chapter-4-Pig-Latin/m-p/591014#M464</link>
      <description>&lt;P&gt;Yes, there was a screen shot (I can see it in the original post but I am re-posting it here):&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="tokenize_flatten_compare.png" style="width: 600px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/32689iB1022E475AC7E7A1/image-size/large?v=v2&amp;amp;px=999" role="button" title="tokenize_flatten_compare.png" alt="tokenize_flatten_compare.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; Note the difference between using TOKENIZE (wordb1) and using FLATTEN and TOKENIZE (wordb2).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Cynthia&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 23 Sep 2019 17:55:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Academy-for-Data-Science/Big-Data-Module-2-Chapter-4-Pig-Latin/m-p/591014#M464</guid>
      <dc:creator>Cynthia_sas</dc:creator>
      <dc:date>2019-09-23T17:55:13Z</dc:date>
    </item>
    <item>
      <title>Re: Big Data - Module 2 Chapter 4 - Pig Latin</title>
      <link>https://communities.sas.com/t5/SAS-Academy-for-Data-Science/Big-Data-Module-2-Chapter-4-Pig-Latin/m-p/591036#M465</link>
      <description>Excellent answer.&lt;BR /&gt;&lt;BR /&gt;Thanks very much.&lt;BR /&gt;Odesh.</description>
      <pubDate>Mon, 23 Sep 2019 20:07:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Academy-for-Data-Science/Big-Data-Module-2-Chapter-4-Pig-Latin/m-p/591036#M465</guid>
      <dc:creator>odesh</dc:creator>
      <dc:date>2019-09-23T20:07:04Z</dc:date>
    </item>
  </channel>
</rss>

