<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Hadoop out of memory error in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Hadoop-out-of-memory-error/m-p/439013#M282362</link>
    <description>&lt;P&gt;Thanks, appreciate your tips... Will remove the order by.. currently trying a suggestion from a colleague to use the GROUP BY function instead of using DISTINCT. As for creating a table that would help... I think i need to create in my environment what is referred to as a KERBEROS ticket&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 21 Feb 2018 18:13:26 GMT</pubDate>
    <dc:creator>brulard</dc:creator>
    <dc:date>2018-02-21T18:13:26Z</dc:date>
    <item>
      <title>Hadoop out of memory error</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Hadoop-out-of-memory-error/m-p/438975#M282360</link>
      <description>&lt;P&gt;hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm fairly new to querying in hadoop.&lt;/P&gt;&lt;P&gt;when running a query using sql pass thru, getting an out of memory error. Does someone know of an option i could use to perhaps by-pass this, or has a suggestion other than my having to break my query into different parts.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;There was no error when I limited the query time frame to 4 years (from date, to date).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;However, when broadening to 7 years, I am getting out of memory error.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Below is the bit of code that is scanning through hundreds of millions of records, that is resource intensive that I believe causes the error.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;LEFT JOIN
	
		(SELECT 
		CRNT_BAL_AMT,eff_from_dt ,eff_TO_dt,ACCT_ID ,PRD_CD
		FROM TRANS_HIST  
		Where  eff_to_dt&amp;gt;='2010-12-31' and eff_to_dt &amp;lt; '9999-12-31'
		
		  ) g ON
		
		A.ACCT_ID=g.ACCT_ID AND date_sub(A.txn_dt,1)=g.EFF_TO_DT
	
	LEFT JOIN
	
		(SELECT 
		CRNT_BAL_AMT,eff_from_dt ,eff_TO_dt,ACCT_ID ,PRD_CD
		FROM TRANS_HIST   
		Where  eff_to_dt&amp;gt;='2011-01-01' and eff_to_dt &amp;lt; '9999-12-31'
		
		  ) h ON
		
		A. ACCT_ID=h.ACCT_ID AND date_sub(g.eff_from_dt,1)=h.EFF_TO_DT
	
			 ORDER BY  a.ACCT_ID)
			;

DISCONNECT FROM hadcon;
quit;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Error message:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;ERROR: Prepare error: Error while processing statement: FAILED: Execution Error, return code 2 from 
       org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Reducer 2, vertexId=vertex_1517179256891_507352_1_09, 
       diagnostics=[Task failed, taskId=task_1517179256891_507352_1_09_000003, diagnostics=[TaskAttempt 0 failed, info=[Error: 
       Failure while running task:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
	at 
       org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Thanks in advance&lt;/P&gt;</description>
      <pubDate>Wed, 21 Feb 2018 16:09:20 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Hadoop-out-of-memory-error/m-p/438975#M282360</guid>
      <dc:creator>brulard</dc:creator>
      <dc:date>2018-02-21T16:09:20Z</dc:date>
    </item>
    <item>
      <title>Re: Hadoop out of memory error</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Hadoop-out-of-memory-error/m-p/439012#M282361</link>
      <description>&lt;P&gt;probably a haddop hive forum could answer this answer better.&amp;nbsp; One thing which i find not all important is&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE class=" language-sas"&gt;&lt;CODE class="  language-sas"&gt;ORDER &lt;SPAN class="token statement"&gt;BY&lt;/SPAN&gt;  a&lt;SPAN class="token punctuation"&gt;.&lt;/SPAN&gt;ACCT_ID&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;if possible try to remove order by clause, it is very resource intense process.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;you can also break queries into 2 or 3 tables&amp;nbsp; (if you access to create table) and build indexes on joining columns.&amp;nbsp; use date_sub in that&amp;nbsp; create new column in this table&lt;/P&gt;
&lt;PRE class=" language-sas"&gt;&lt;CODE class="  language-sas"&gt;date_sub&lt;SPAN class="token punctuation"&gt;(&lt;/SPAN&gt;A&lt;SPAN class="token punctuation"&gt;.&lt;/SPAN&gt;txn_dt&lt;SPAN class="token punctuation"&gt;,&lt;/SPAN&gt;&lt;SPAN class="token number"&gt;1&lt;/SPAN&gt;&lt;SPAN class="token punctuation"&gt;) as col&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;build a index on columns like acct_id, col&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 21 Feb 2018 18:09:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Hadoop-out-of-memory-error/m-p/439012#M282361</guid>
      <dc:creator>kiranv_</dc:creator>
      <dc:date>2018-02-21T18:09:24Z</dc:date>
    </item>
    <item>
      <title>Re: Hadoop out of memory error</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Hadoop-out-of-memory-error/m-p/439013#M282362</link>
      <description>&lt;P&gt;Thanks, appreciate your tips... Will remove the order by.. currently trying a suggestion from a colleague to use the GROUP BY function instead of using DISTINCT. As for creating a table that would help... I think i need to create in my environment what is referred to as a KERBEROS ticket&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 21 Feb 2018 18:13:26 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Hadoop-out-of-memory-error/m-p/439013#M282362</guid>
      <dc:creator>brulard</dc:creator>
      <dc:date>2018-02-21T18:13:26Z</dc:date>
    </item>
    <item>
      <title>Re: Hadoop out of memory error</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Hadoop-out-of-memory-error/m-p/439160#M282363</link>
      <description>&lt;P&gt;This is something your Hadoop administrators should take a look at with you. The container provisioned for your task on the Hadoop compute cluster didn't have enough memory allocated to it to complete the task. You'll need to tweak the configuration inside the Hadoop cluster to fit the workload that query will generate.&lt;/P&gt;</description>
      <pubDate>Thu, 22 Feb 2018 03:37:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Hadoop-out-of-memory-error/m-p/439160#M282363</guid>
      <dc:creator>SimonDawson</dc:creator>
      <dc:date>2018-02-22T03:37:22Z</dc:date>
    </item>
    <item>
      <title>Re: Hadoop out of memory error</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Hadoop-out-of-memory-error/m-p/439271#M282364</link>
      <description>Thanks. I appreciate your feedback and will open a ticket to our Hadoop administrators. (I did manage to produce my desired output but had to split my query into two timeframes.)</description>
      <pubDate>Thu, 22 Feb 2018 13:53:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Hadoop-out-of-memory-error/m-p/439271#M282364</guid>
      <dc:creator>brulard</dc:creator>
      <dc:date>2018-02-22T13:53:55Z</dc:date>
    </item>
  </channel>
</rss>

