<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Advice on retrieving from a large dataset in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Advice-on-retrieving-from-a-large-dataset/m-p/448850#M283486</link>
    <description>&lt;P&gt;&amp;gt; &lt;SPAN&gt;If both your source tables are in Oracle then definitely use SQL as this will push all the processing to Oracle.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Yes this. Keep all the subsetting in the source system and only transfer the result. Any other way is wasteful.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;As for transferring, ask your vendor what they expect.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 27 Mar 2018 02:26:49 GMT</pubDate>
    <dc:creator>ChrisNZ</dc:creator>
    <dc:date>2018-03-27T02:26:49Z</dc:date>
    <item>
      <title>Advice on retrieving from a large dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Advice-on-retrieving-from-a-large-dataset/m-p/448645#M283480</link>
      <description>&lt;P&gt;Good Morning,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am working on pulling data for our vendor and it requires me&amp;nbsp;to retrieve claims data &amp;nbsp;from a very huge dataset. I am pulling claims data from&amp;nbsp;a dataset with 2.5B observations.&amp;nbsp; I am retrieving all claims records associated with a set of unique claims_case_numbers with approx. 11.5M observations. The resulting dataset is about 52M obs with 47 variables&amp;nbsp;and runtime is 1.5 hrs. I would like to know if my coding approach&amp;nbsp;&amp;nbsp;below&amp;nbsp; utilizing Hash object is&amp;nbsp;already efficient for this task or if it can still be improved. Appreciate any suggestions.&lt;/P&gt;&lt;P&gt;Also, any recommendations on how to get the dataset results&amp;nbsp;with &amp;nbsp;approx&amp;nbsp;13GB in size to our vendor&amp;nbsp;. My initial thought is exporting in csv with zip. Thank you as always.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000080" face="Courier New" size="3"&gt;&lt;STRONG&gt;data&lt;/STRONG&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt; vendor_claims;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;length&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;&amp;nbsp;clm_case_key &lt;/FONT&gt;&lt;STRONG&gt;&lt;FONT color="#008080" face="Courier New" size="3"&gt;8.&lt;/FONT&gt;&lt;/STRONG&gt;&lt;FONT face="Courier New" size="3"&gt;;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;if&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt; _N_ =&lt;/FONT&gt;&lt;STRONG&gt;&lt;FONT color="#008080" face="Courier New" size="3"&gt;1&lt;/FONT&gt;&lt;/STRONG&gt; &lt;FONT color="#0000ff" face="Courier New" size="3"&gt;then&lt;/FONT&gt; &lt;FONT color="#0000ff" face="Courier New" size="3"&gt;do&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;DECLARE&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt; HASH H(dataset:&lt;/FONT&gt;&lt;FONT color="#800080" face="Courier New" size="3"&gt;'CLMS_CASES_LKUP'&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;);&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="Courier New" size="3"&gt;H.DEFINEKEY (&lt;/FONT&gt;&lt;FONT color="#800080" face="Courier New" size="3"&gt;'clm_case_key'&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;);&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;H.DEFINEDONE();&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;end&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;SET&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;&amp;nbsp;CLAIMS_MASTER (&lt;/FONT&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;keep&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;=&amp;amp;ON_FILE_HEADER &lt;/FONT&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;where&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;=((serv_from_date ge &lt;/FONT&gt;&lt;STRONG&gt;&lt;FONT color="#008080" face="Courier New" size="3"&gt;'1Jan2016'd&lt;/FONT&gt;&lt;/STRONG&gt;&lt;FONT face="Courier New" size="3"&gt; AND serv_from_Date le &lt;/FONT&gt;&lt;STRONG&gt;&lt;FONT color="#008080" face="Courier New" size="3"&gt;'31Dec2017'd&lt;/FONT&gt;&lt;/STRONG&gt;&lt;FONT face="Courier New" size="3"&gt;)&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="Courier New" size="3"&gt;and prod_cd not in (&lt;/FONT&gt;&lt;FONT color="#800080" face="Courier New" size="3"&gt;'PROD_COM','PROD_MDR'&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;)&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="Courier New" size="3"&gt;and payment_cd = &lt;/FONT&gt;&lt;FONT color="#800080" face="Courier New" size="3"&gt;'C'&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;));&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;if&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt; H.find() = &lt;/FONT&gt;&lt;STRONG&gt;&lt;FONT color="#008080" face="Courier New" size="3"&gt;0&lt;/FONT&gt;&lt;/STRONG&gt; &lt;FONT color="#0000ff" face="Courier New" size="3"&gt;then&lt;/FONT&gt; &lt;FONT color="#0000ff" face="Courier New" size="3"&gt;output&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000080" face="Courier New" size="3"&gt;&lt;STRONG&gt;run&lt;/STRONG&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 26 Mar 2018 14:30:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Advice-on-retrieving-from-a-large-dataset/m-p/448645#M283480</guid>
      <dc:creator>lmtamina</dc:creator>
      <dc:date>2018-03-26T14:30:29Z</dc:date>
    </item>
    <item>
      <title>Re: Advice on retrieving from a large dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Advice-on-retrieving-from-a-large-dataset/m-p/448691#M283481</link>
      <description>&lt;P&gt;A SAS dataset is ODBC compliant. If they are willing to install the reader, I would leave it in sas7bdat format. If not, CSV is probably as good as it gets but I would use tabs vs commas as delimiter.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Try doing it with SQL vs hash. SQL will optimize your code.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You would be best to send this to 1 of 2 other places:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;- SAS consulting forum on Google (declining size but really good SAS people)&lt;/P&gt;
&lt;P&gt;- SAS-L. Venerable, best coders including Paul Dorfman who is the expert on hash object. Roger D will also weigh in who is damn good on volume issues.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 26 Mar 2018 16:08:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Advice-on-retrieving-from-a-large-dataset/m-p/448691#M283481</guid>
      <dc:creator>AlanC</dc:creator>
      <dc:date>2018-03-26T16:08:12Z</dc:date>
    </item>
    <item>
      <title>Re: Advice on retrieving from a large dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Advice-on-retrieving-from-a-large-dataset/m-p/448709#M283482</link>
      <description>Just for my understanding, where is your two original data sets stored  (DBMS, local/remote) and where do you need to store/analyze the result?</description>
      <pubDate>Mon, 26 Mar 2018 16:45:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Advice-on-retrieving-from-a-large-dataset/m-p/448709#M283482</guid>
      <dc:creator>LinusH</dc:creator>
      <dc:date>2018-03-26T16:45:17Z</dc:date>
    </item>
    <item>
      <title>Re: Advice on retrieving from a large dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Advice-on-retrieving-from-a-large-dataset/m-p/448752#M283483</link>
      <description>&lt;LI-SPOILER&gt;&amp;nbsp;&lt;/LI-SPOILER&gt;&lt;P&gt;Thank you . I will&amp;nbsp;try to &amp;nbsp;post to the other forums.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 26 Mar 2018 17:42:07 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Advice-on-retrieving-from-a-large-dataset/m-p/448752#M283483</guid>
      <dc:creator>lmtamina</dc:creator>
      <dc:date>2018-03-26T17:42:07Z</dc:date>
    </item>
    <item>
      <title>Re: Advice on retrieving from a large dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Advice-on-retrieving-from-a-large-dataset/m-p/448753#M283484</link>
      <description>The results will be sent to our vendor via FTP site. And I am accessing the 2 database from DBMS(Oracle) and via client setup. We are using SAS enterprise guide. Hope this is helpful.</description>
      <pubDate>Mon, 26 Mar 2018 17:42:53 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Advice-on-retrieving-from-a-large-dataset/m-p/448753#M283484</guid>
      <dc:creator>lmtamina</dc:creator>
      <dc:date>2018-03-26T17:42:53Z</dc:date>
    </item>
    <item>
      <title>Re: Advice on retrieving from a large dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Advice-on-retrieving-from-a-large-dataset/m-p/448809#M283485</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/187028"&gt;@lmtamina&lt;/a&gt;&lt;/P&gt;
&lt;P&gt;If both your source tables are in Oracle then definitely use SQL as this will push all the processing to Oracle. If you want to further speed up the process then use explicit pass-through SQL (=formulated in the Oracle SQL flavour)&amp;nbsp;with Oracle&amp;nbsp;hints to parallelize processing.&lt;/P&gt;
&lt;P&gt;...But: I guess the bottleneck is transferring the data from Oracle to SAS over the network so once you rewrite your code so that the data volume gets already reduced on the Oracle side end-to-end elapsed time should become quite a bit shorter.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Once you've got your data in SAS: If your vendor uses SAS as well then I'd first try and zip the SAS dataset. If that makes it small enough for an FTP transfer then good, else I'd split it up into chunks and zip these chunks in separate archives.&lt;/P&gt;</description>
      <pubDate>Mon, 26 Mar 2018 20:45:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Advice-on-retrieving-from-a-large-dataset/m-p/448809#M283485</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2018-03-26T20:45:29Z</dc:date>
    </item>
    <item>
      <title>Re: Advice on retrieving from a large dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Advice-on-retrieving-from-a-large-dataset/m-p/448850#M283486</link>
      <description>&lt;P&gt;&amp;gt; &lt;SPAN&gt;If both your source tables are in Oracle then definitely use SQL as this will push all the processing to Oracle.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Yes this. Keep all the subsetting in the source system and only transfer the result. Any other way is wasteful.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;As for transferring, ask your vendor what they expect.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 27 Mar 2018 02:26:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Advice-on-retrieving-from-a-large-dataset/m-p/448850#M283486</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2018-03-27T02:26:49Z</dc:date>
    </item>
  </channel>
</rss>

