<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Data Processing with PROC SQL for Multiple Billion Rows in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Data-Processing-with-PROC-SQL-for-Multiple-Billion-Rows/m-p/895748#M353910</link>
    <description>&lt;P&gt;Where did the SAS datasets get created? You will get that message if the datasets were created in a SAS installation that is not identical to your one. For example if it runs on a different OS.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I'd suggest that you try using the DATA step SET statement to stack your tables as that will likely be faster than SQL. Also should filter the DATA1 to DATA60 tables if possible rather than reading all data then filtering afterwards:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;
  set data1 - data60;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
    <pubDate>Mon, 25 Sep 2023 19:05:40 GMT</pubDate>
    <dc:creator>SASKiwi</dc:creator>
    <dc:date>2023-09-25T19:05:40Z</dc:date>
    <item>
      <title>Data Processing with PROC SQL for Multiple Billion Rows</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Data-Processing-with-PROC-SQL-for-Multiple-Billion-Rows/m-p/895744#M353909</link>
      <description>&lt;P&gt;I have 60 sas7bdat datasets, around 100 million rows each and 60-70 columns, which contains quite similar columns and would like to process them altogether to filter and create a smaller dataset (containing all the columns) containing rows with specific information with the following code:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;PROC SQL;&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; CREATE TABLE bigtable AS&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; SELECT * FROM DATA1&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; OUTER UNION CORR&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; SELECT * FROM DATA2&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; .......&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; OUTER UNION CORR&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; SELECT * FROM DATA60;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; CREATE TABLE filtered AS&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; SELECT * FROM bigtable&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; WHERE &amp;lt;filtering based on 6 columns in the 'bigtable' with AND &amp;amp; OR&amp;gt;;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;lt;and then save 'filtered' in CSV&amp;gt;;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Two questions relating this process:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;DATA1, ..., DATA60 are sas7bdat datasets, but I get the following messages: "&lt;SPAN class=""&gt;Data file is in a format that is native to another host". This may cause processing time to be much slower - datasets are in sas7bdat and in latin1 western (ISO) - any solutions, and why the data is not native formatted even when they're sas7bdat datasets?&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN class=""&gt;Is there a much faster way to process data via PROC SQL, to vertically combine and filter billions of rows?&lt;/SPAN&gt;&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;&lt;SPAN class=""&gt;Thank you in advance!&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 25 Sep 2023 18:54:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Data-Processing-with-PROC-SQL-for-Multiple-Billion-Rows/m-p/895744#M353909</guid>
      <dc:creator>sas_novice2</dc:creator>
      <dc:date>2023-09-25T18:54:50Z</dc:date>
    </item>
    <item>
      <title>Re: Data Processing with PROC SQL for Multiple Billion Rows</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Data-Processing-with-PROC-SQL-for-Multiple-Billion-Rows/m-p/895748#M353910</link>
      <description>&lt;P&gt;Where did the SAS datasets get created? You will get that message if the datasets were created in a SAS installation that is not identical to your one. For example if it runs on a different OS.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I'd suggest that you try using the DATA step SET statement to stack your tables as that will likely be faster than SQL. Also should filter the DATA1 to DATA60 tables if possible rather than reading all data then filtering afterwards:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;
  set data1 - data60;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Mon, 25 Sep 2023 19:05:40 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Data-Processing-with-PROC-SQL-for-Multiple-Billion-Rows/m-p/895748#M353910</guid>
      <dc:creator>SASKiwi</dc:creator>
      <dc:date>2023-09-25T19:05:40Z</dc:date>
    </item>
    <item>
      <title>Re: Data Processing with PROC SQL for Multiple Billion Rows</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Data-Processing-with-PROC-SQL-for-Multiple-Billion-Rows/m-p/895751#M353911</link>
      <description>&lt;P&gt;Thank you for the response! Wouldn't filtering the dataset one by one for all 60 of them result in at least the same time spent than filtering them altogether? Filtering all the datasets together in a single step would require the filter to only traverse once, as opposed to traverse 60 times; although the same amount of rows explored - but have to re-type the filtering code for 60 times?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Can I use that DATA step in conjunction with PROC SQL? So after your code, using PROC SQL to filter the rows?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 25 Sep 2023 19:08:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Data-Processing-with-PROC-SQL-for-Multiple-Billion-Rows/m-p/895751#M353911</guid>
      <dc:creator>sas_novice2</dc:creator>
      <dc:date>2023-09-25T19:08:50Z</dc:date>
    </item>
    <item>
      <title>Re: Data Processing with PROC SQL for Multiple Billion Rows</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Data-Processing-with-PROC-SQL-for-Multiple-Billion-Rows/m-p/895786#M353919</link>
      <description>&lt;P&gt;By not filtering as you are reading your input datasets you are reading more data than necessary so your program will be slower. If you add a WHERE clause in my example code it applies equally to all datasets:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;
  set data1 - data60;
  where &amp;lt; your selection logic &amp;gt;;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Mon, 25 Sep 2023 20:13:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Data-Processing-with-PROC-SQL-for-Multiple-Billion-Rows/m-p/895786#M353919</guid>
      <dc:creator>SASKiwi</dc:creator>
      <dc:date>2023-09-25T20:13:37Z</dc:date>
    </item>
    <item>
      <title>Re: Data Processing with PROC SQL for Multiple Billion Rows</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Data-Processing-with-PROC-SQL-for-Multiple-Billion-Rows/m-p/895793#M353921</link>
      <description>&lt;P&gt;There is nothing in your posted code that requires using SQL.&lt;/P&gt;
&lt;P&gt;Although I have no idea what you mean by this phrase "&lt;SPAN&gt;filtering based on 6 columns in the 'bigtable'&amp;nbsp;"&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;Are you trying to imply that some of the 6 variables do not exist in all of the original datasets?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;One method that can be faster is to use PROC APPEND.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc append data=data1 base=BIGTABLE ;
      WHERE &amp;lt;filtering based on 6 VARIABLES in the SMALL DATASET with AND &amp;amp; OR&amp;gt;;
run;

proc append data=data2 base=BIGTABLE ;
      WHERE &amp;lt;filtering based on 6 VARIABLES in the SMALL DATASET with AND &amp;amp; OR&amp;gt;;
run;
 ...

&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Mon, 25 Sep 2023 20:37:26 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Data-Processing-with-PROC-SQL-for-Multiple-Billion-Rows/m-p/895793#M353921</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2023-09-25T20:37:26Z</dc:date>
    </item>
  </channel>
</rss>

