<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Replication Factor for SASHDAT and HDFS in SAS Academy for Data Science</title>
    <link>https://communities.sas.com/t5/SAS-Academy-for-Data-Science/Replication-Factor-for-SASHDAT-and-HDFS/m-p/421127#M36</link>
    <description>&lt;P&gt;When we use SASHDAT libname engine the files are place on HDFS using the path= &amp;lt;HDFS path&amp;gt; and the&lt;/P&gt;&lt;P&gt;copies= for number of replication. The replication factor for SASHDAT tables is 2 by default.&lt;/P&gt;&lt;P&gt;Whereas on HDFS the replication factor is 3 by default. Now if a sas table is loaded to HDFS,&lt;/P&gt;&lt;P&gt;though SASHDAT it will have 2 copies or where as in HDFS it will have 3 Copies. How is that possible.&lt;/P&gt;&lt;P&gt;I'm bit confused. Can anyone explain me the above please?&lt;/P&gt;</description>
    <pubDate>Thu, 14 Dec 2017 09:53:16 GMT</pubDate>
    <dc:creator>akpattnaik</dc:creator>
    <dc:date>2017-12-14T09:53:16Z</dc:date>
    <item>
      <title>Replication Factor for SASHDAT and HDFS</title>
      <link>https://communities.sas.com/t5/SAS-Academy-for-Data-Science/Replication-Factor-for-SASHDAT-and-HDFS/m-p/421127#M36</link>
      <description>&lt;P&gt;When we use SASHDAT libname engine the files are place on HDFS using the path= &amp;lt;HDFS path&amp;gt; and the&lt;/P&gt;&lt;P&gt;copies= for number of replication. The replication factor for SASHDAT tables is 2 by default.&lt;/P&gt;&lt;P&gt;Whereas on HDFS the replication factor is 3 by default. Now if a sas table is loaded to HDFS,&lt;/P&gt;&lt;P&gt;though SASHDAT it will have 2 copies or where as in HDFS it will have 3 Copies. How is that possible.&lt;/P&gt;&lt;P&gt;I'm bit confused. Can anyone explain me the above please?&lt;/P&gt;</description>
      <pubDate>Thu, 14 Dec 2017 09:53:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Academy-for-Data-Science/Replication-Factor-for-SASHDAT-and-HDFS/m-p/421127#M36</guid>
      <dc:creator>akpattnaik</dc:creator>
      <dc:date>2017-12-14T09:53:16Z</dc:date>
    </item>
    <item>
      <title>Re: Replication Factor for SASHDAT and HDFS</title>
      <link>https://communities.sas.com/t5/SAS-Academy-for-Data-Science/Replication-Factor-for-SASHDAT-and-HDFS/m-p/421130#M37</link>
      <description>&lt;P&gt;Not doing this a lot, so from perspective I would guess that the SAS default is overriding the hdfs default. So no, I don't think that there will be three. Have you checked in the file system?&lt;/P&gt;</description>
      <pubDate>Thu, 14 Dec 2017 10:07:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Academy-for-Data-Science/Replication-Factor-for-SASHDAT-and-HDFS/m-p/421130#M37</guid>
      <dc:creator>LinusH</dc:creator>
      <dc:date>2017-12-14T10:07:45Z</dc:date>
    </item>
    <item>
      <title>Re: Replication Factor for SASHDAT and HDFS</title>
      <link>https://communities.sas.com/t5/SAS-Academy-for-Data-Science/Replication-Factor-for-SASHDAT-and-HDFS/m-p/421255#M38</link>
      <description>&lt;P&gt;I tested as LinusH suggested. He is correct. HDFS has a default replication factor and the SASHDAT engine overrides that when it creates files in HDFS. The LIBNAME engine for SASHDAT has a default value for copies= even if you don't specify it on the LIBNAME statement. This is what I found in the doc:&lt;/P&gt;
&lt;H4 class="xis-argument"&gt;COPIES=&lt;SPAN class="xis-userSuppliedValue"&gt;n&lt;/SPAN&gt;&lt;/H4&gt;
&lt;DIV class="xis-argumentDescription"&gt;
&lt;P class="xis-paraSimpleFirst"&gt;specifies the number of replications to make for the data set (beyond the original blocks). The default value is 2 when the INNAMEONLY option is specified and otherwise is 1. Replicated blocks are used to provide fault tolerance. If a machine in the cluster becomes unavailable, then the blocks needed for the SASHDAT file can be retrieved from replications on other machines. If you specify COPIES=0, then the original blocks are distributed, but no replications are made and there is no fault tolerance for the data.&lt;/P&gt;
&lt;P class="xis-paraSimpleFirst"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="xis-paraSimpleFirst"&gt;Here is the link to that part of the documentation: &lt;A title="LASR Analytics Server Reference Guide" href="http://support.sas.com/documentation/cdl/en/inmsref/70021/HTML/default/viewer.htm#p0kn1b8a7yt44fn1qwp8w1b8l92w.htm" target="_blank"&gt;http://support.sas.com/documentation/cdl/en/inmsref/70021/HTML/default/viewer.htm#p0kn1b8a7yt44fn1qwp8w1b8l92w.htm&lt;/A&gt;&lt;/P&gt;
&lt;P class="xis-paraSimpleFirst"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="xis-paraSimpleFirst"&gt;also, here is how I discovered with HDFS commands how to determine the replication factor for HDFS files:&lt;/P&gt;
&lt;P class="xis-paraSimpleFirst"&gt;&lt;A title="How to determine HDFS replication" href="https://www.systutorials.com/qa/1297/how-to-check-the-replication-factor-of-a-file-in-hdfs" target="_blank"&gt;https://www.systutorials.com/qa/1297/how-to-check-the-replication-factor-of-a-file-in-hdfs&lt;/A&gt;&lt;/P&gt;
&lt;P class="xis-paraSimpleFirst"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="xis-paraSimpleFirst"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="xis-paraSimpleFirst"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="xis-paraSimpleFirst"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="xis-paraSimpleFirst"&gt;&amp;nbsp;&lt;/P&gt;
&lt;/DIV&gt;</description>
      <pubDate>Thu, 14 Dec 2017 17:21:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Academy-for-Data-Science/Replication-Factor-for-SASHDAT-and-HDFS/m-p/421255#M38</guid>
      <dc:creator>DavidGhan</dc:creator>
      <dc:date>2017-12-14T17:21:01Z</dc:date>
    </item>
  </channel>
</rss>

