<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Efficient Code to Insert Data into a current table in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Efficient-Code-to-Insert-Data-into-a-current-table/m-p/438419#M109313</link>
    <description>&lt;P&gt;A couple of ideas to try ...&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;First, create the set of IDs more quickly:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;data work.temp;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;set libpermanent_Table (firstobs=30000000 keep=Transaction_ID);&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;run;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Your original data step read in all 90 variables, then dropped 89 of them.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Second (since I'm not really familiar with the speed of SQL's INSERT, so this would be an experiment), create the subset of Table_01 separately from adding it to the permanent data set:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;proc sql;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;create table subset as&amp;nbsp;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;select * from work.Table01&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;where Transaction_ID not in&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;select Transaction_ID from work.temp;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;quit;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Then appending might be much faster:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;proc append data=subset base=lib.Permanent_Table;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;run;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The variable definitions should match 100% to do this, however.&amp;nbsp; And it should be an acceptable solution to add the new observations to the end of the existing data set.&lt;/P&gt;</description>
    <pubDate>Mon, 19 Feb 2018 15:21:27 GMT</pubDate>
    <dc:creator>Astounding</dc:creator>
    <dc:date>2018-02-19T15:21:27Z</dc:date>
    <item>
      <title>Efficient Code to Insert Data into a current table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Efficient-Code-to-Insert-Data-into-a-current-table/m-p/438414#M109311</link>
      <description>&lt;P&gt;I currently use the below code to insert data from work into a permanent table.&amp;nbsp; I was wondering if there is a more efficient way of doing this?&lt;/P&gt;&lt;P&gt;The process is:&lt;/P&gt;&lt;P&gt;Import CSV&lt;/P&gt;&lt;P&gt;Check to see if transaction ID is in Permanent SAS table if not insert&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;Data Work.temp;
Set Lib.Permanent_Table  (firstobs=30000000);
Keep Transaction_ID;
Run;



Proc sql;
INSERT INTO Lib.Permanent_Table 
SELECT * FROM WORK.Table01
WHERE 'Transaction_Id'n NOT IN
(SELECT 'Transaction_Id'n FROM work.temp)
;
drop table work.temp
;
QUIT;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The Permanent table has 41,995,622 observations and I am inserting another 150k every day with 90 columns.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000080" face="Courier New" size="3"&gt;&lt;STRONG&gt;&amp;nbsp;&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000080" face="Courier New" size="3"&gt;&lt;STRONG&gt;&amp;nbsp;&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 19 Feb 2018 15:07:44 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Efficient-Code-to-Insert-Data-into-a-current-table/m-p/438414#M109311</guid>
      <dc:creator>DG1984</dc:creator>
      <dc:date>2018-02-19T15:07:44Z</dc:date>
    </item>
    <item>
      <title>Re: Efficient Code to Insert Data into a current table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Efficient-Code-to-Insert-Data-into-a-current-table/m-p/438416#M109312</link>
      <description>&lt;P&gt;Take a look at the &lt;A href="http://documentation.sas.com/?docsetId=lestmtsref&amp;amp;docsetTarget=p18w3br45er2qun1r8sfmm4grjyr.htm&amp;amp;docsetVersion=9.4&amp;amp;locale=da#p10xmf9ngxsy12n1hm91amyhuth0" target="_self"&gt;Update Statement&lt;/A&gt; in the data step.&lt;/P&gt;</description>
      <pubDate>Mon, 19 Feb 2018 15:19:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Efficient-Code-to-Insert-Data-into-a-current-table/m-p/438416#M109312</guid>
      <dc:creator>PeterClemmensen</dc:creator>
      <dc:date>2018-02-19T15:19:22Z</dc:date>
    </item>
    <item>
      <title>Re: Efficient Code to Insert Data into a current table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Efficient-Code-to-Insert-Data-into-a-current-table/m-p/438419#M109313</link>
      <description>&lt;P&gt;A couple of ideas to try ...&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;First, create the set of IDs more quickly:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;data work.temp;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;set libpermanent_Table (firstobs=30000000 keep=Transaction_ID);&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;run;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Your original data step read in all 90 variables, then dropped 89 of them.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Second (since I'm not really familiar with the speed of SQL's INSERT, so this would be an experiment), create the subset of Table_01 separately from adding it to the permanent data set:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;proc sql;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;create table subset as&amp;nbsp;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;select * from work.Table01&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;where Transaction_ID not in&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;select Transaction_ID from work.temp;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;quit;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Then appending might be much faster:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;proc append data=subset base=lib.Permanent_Table;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;run;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The variable definitions should match 100% to do this, however.&amp;nbsp; And it should be an acceptable solution to add the new observations to the end of the existing data set.&lt;/P&gt;</description>
      <pubDate>Mon, 19 Feb 2018 15:21:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Efficient-Code-to-Insert-Data-into-a-current-table/m-p/438419#M109313</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2018-02-19T15:21:27Z</dc:date>
    </item>
    <item>
      <title>Re: Efficient Code to Insert Data into a current table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Efficient-Code-to-Insert-Data-into-a-current-table/m-p/438421#M109315</link>
      <description>&lt;P&gt;To think of it the other way round:&lt;/P&gt;
&lt;PRE&gt;proc sql;
  create table data_to_add as 
  select * 
  from   csv_data
  where transaction_id not in (select distinct transaction_id from permanent_table);
run;

proc append base=permanent_table data=data_to_add;
run;
&lt;/PRE&gt;
&lt;P&gt;Avoid this:&amp;nbsp;&lt;/P&gt;
&lt;PRE class=" language-sas"&gt;&lt;CODE class="  language-sas"&gt;&lt;SPAN class="token string"&gt;'Transaction_Id'&lt;/SPAN&gt;&lt;SPAN class="token function"&gt;n&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;If you are using SAS, use SAS naming conventions for your variables.&amp;nbsp; The only time '&amp;nbsp; 'n named literals do anything other than make your code more messy is when you have to deal with Excel or other poor datasources.&lt;/P&gt;</description>
      <pubDate>Mon, 19 Feb 2018 15:28:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Efficient-Code-to-Insert-Data-into-a-current-table/m-p/438421#M109315</guid>
      <dc:creator>RW9</dc:creator>
      <dc:date>2018-02-19T15:28:36Z</dc:date>
    </item>
    <item>
      <title>Re: Efficient Code to Insert Data into a current table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Efficient-Code-to-Insert-Data-into-a-current-table/m-p/438441#M109320</link>
      <description>&lt;P&gt;Ninety columns?&amp;nbsp; 150K new observations daily?&amp;nbsp; I'd suggest maintaining&amp;nbsp;a second dataset of just the current transaction_id's (call it lib.id_list)&amp;nbsp;.&amp;nbsp; Then you could&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;Read the new transactions and see if they are already in lib.id_list, using a hash object from lib.id_list.&lt;/LI&gt;
&lt;LI&gt;If&amp;nbsp;not then
&lt;OL&gt;
&lt;LI&gt;put the transaction&amp;nbsp;in a data set (view) to be appended&amp;nbsp;&amp;nbsp; (data_to_append)&lt;/LI&gt;
&lt;LI&gt;record the new transaction_id to be appended to&amp;nbsp;the id_list&amp;nbsp;(new_ids)&lt;/LI&gt;
&lt;/OL&gt;
&lt;/LI&gt;
&lt;LI&gt;append 2.1 and 2.2 to their respective master data set files.&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data new_ids (keep=transaction_id)
     data_to_append   /view=data_to_append;

  set new_transactions;
  if _n_=1 then do;    
    declare hash h (dataset:'lib.id_list');
      h.definekey('transaction_id');
      h.definedone();
  end;
  if h.find() ^=0;
run;

proc append base=lib.permanenttable append=data_to_append;
run;
proc append base=lib.id_list data=new_ids;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Edited additional note.&amp;nbsp; If your original &amp;nbsp;41,995,622 observations all have unique transaction_id's, then I would modify the program above to establish as many "buckets" as possible in hash object h.&amp;nbsp; So change the DECLARE statement to&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp; declare hash h (dataset:'lib.id_list',hashexp:16);&lt;/P&gt;
&lt;P&gt;which tells SAS to make 2**16=64K buckets, each with about 641 transaction_id's.&amp;nbsp; The h.find() method will surely be faster than with the default hashexp:8 (2**8 buckets each with about 164,000 transaction_id's).&amp;nbsp; hashexp:16 is the maximum supported value.&lt;/P&gt;</description>
      <pubDate>Mon, 19 Feb 2018 18:17:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Efficient-Code-to-Insert-Data-into-a-current-table/m-p/438441#M109320</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2018-02-19T18:17:48Z</dc:date>
    </item>
    <item>
      <title>Re: Efficient Code to Insert Data into a current table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Efficient-Code-to-Insert-Data-into-a-current-table/m-p/438474#M109330</link>
      <description>&lt;P&gt;First, make sure you have an index defined on Transaction_Id in your permanent table.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Then try:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;Proc sql;

create table Table02 as
select * from Table01
where Transaction_Id not in (select Transaction_Id from lib.Permanent_Table);

INSERT INTO Lib.Permanent_Table 
SELECT * FROM WORK.Table02;

drop table work.table02;
QUIT;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;(untested)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 19 Feb 2018 19:43:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Efficient-Code-to-Insert-Data-into-a-current-table/m-p/438474#M109330</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2018-02-19T19:43:49Z</dc:date>
    </item>
    <item>
      <title>Re: Efficient Code to Insert Data into a current table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Efficient-Code-to-Insert-Data-into-a-current-table/m-p/438510#M109349</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/462"&gt;@PGStats&lt;/a&gt;:&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Your code will certainly be superior to the original OP code.&amp;nbsp; But an index on 41M unique transaction id's (I assume they are unique)&amp;nbsp;will yield a binary search tree&amp;nbsp;with depth about 25.&amp;nbsp; I would be interested in comparing its performance to the hash solution I suggested.&amp;nbsp; This has long been one of the established advantages of hash search vs binary search, and I presume it would apply in this case.&lt;/P&gt;</description>
      <pubDate>Mon, 19 Feb 2018 22:53:46 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Efficient-Code-to-Insert-Data-into-a-current-table/m-p/438510#M109349</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2018-02-19T22:53:46Z</dc:date>
    </item>
    <item>
      <title>Re: Efficient Code to Insert Data into a current table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Efficient-Code-to-Insert-Data-into-a-current-table/m-p/438551#M109358</link>
      <description>With such size of a table and inserting to it which causing an index update, I would definitely move it to a SPDE library.</description>
      <pubDate>Tue, 20 Feb 2018 07:15:26 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Efficient-Code-to-Insert-Data-into-a-current-table/m-p/438551#M109358</guid>
      <dc:creator>LinusH</dc:creator>
      <dc:date>2018-02-20T07:15:26Z</dc:date>
    </item>
    <item>
      <title>Re: Efficient Code to Insert Data into a current table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Efficient-Code-to-Insert-Data-into-a-current-table/m-p/438552#M109359</link>
      <description>Will definitely give this a try and compare with a few others I have been given!!&lt;BR /&gt;&lt;BR /&gt;Thanks&lt;BR /&gt;&lt;BR /&gt;All for help!</description>
      <pubDate>Tue, 20 Feb 2018 07:26:46 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Efficient-Code-to-Insert-Data-into-a-current-table/m-p/438552#M109359</guid>
      <dc:creator>DG1984</dc:creator>
      <dc:date>2018-02-20T07:26:46Z</dc:date>
    </item>
  </channel>
</rss>

