<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Efficiently delete rows based on conditions in an indexed table in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Efficiently-delete-rows-based-on-conditions-in-an-indexed-table/m-p/460910#M284574</link>
    <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/31461"&gt;@mkeintz&lt;/a&gt;&amp;nbsp;I answered my own question.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If the table widens (add a variable with length $200) or if the number of updates reduces (replace .9 with .99), MODIFY becomes faster.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;As usual... it depends...&amp;nbsp; &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;MODIFY BY is never even in the race though.&amp;nbsp; Only MODIFY KEY= should be used, if an index is present.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 09 May 2018 04:22:55 GMT</pubDate>
    <dc:creator>ChrisNZ</dc:creator>
    <dc:date>2018-05-09T04:22:55Z</dc:date>
    <item>
      <title>Efficiently delete rows based on conditions in an indexed table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Efficiently-delete-rows-based-on-conditions-in-an-indexed-table/m-p/459252#M284563</link>
      <description>&lt;P&gt;Hello,&lt;BR /&gt;&lt;BR /&gt;I have a data set with about 5 million observations. This table contains an index with several columns, including one column for the year and one column for the country. I want to delete all observations of a certain country for a certain year from this table.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;if country = 'NZ' &amp;amp; year = 2018 then delete;&lt;/PRE&gt;&lt;P&gt;Is there an efficient way to do this and make use of the indexed data? So far I have tried the following two solutions, which have a too long runtime of about 30 minutes (both).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;proc sql;
    delete from mydata
    where country = 'NZ' &amp;amp; year = 2018;
quit;


data mydata;
   modify mydata;
   if country = 'NZ' &amp;amp; year = 2018 then remove;
  run;&lt;/PRE&gt;&lt;P&gt;In case it matters: the table is sorted by the country and year columns.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I want to avoid writing a data set where the index is simply recreated. But perhaps it is the most efficient solution?&lt;BR /&gt;&lt;BR /&gt;Thanks for your help.&lt;/P&gt;</description>
      <pubDate>Wed, 02 May 2018 11:22:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Efficiently-delete-rows-based-on-conditions-in-an-indexed-table/m-p/459252#M284563</guid>
      <dc:creator>MarioS_</dc:creator>
      <dc:date>2018-05-02T11:22:38Z</dc:date>
    </item>
    <item>
      <title>Re: Efficiently delete rows based on conditions in an indexed table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Efficiently-delete-rows-based-on-conditions-in-an-indexed-table/m-p/459279#M284564</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I would suggest to create a new data set when there is a data manipulation. I always prefer to retain the original data and keep sub sets of data into new temp tables.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks,&lt;/P&gt;
&lt;P&gt;Vishnu&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 02 May 2018 12:43:47 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Efficiently-delete-rows-based-on-conditions-in-an-indexed-table/m-p/459279#M284564</guid>
      <dc:creator>Vish33</dc:creator>
      <dc:date>2018-05-02T12:43:47Z</dc:date>
    </item>
    <item>
      <title>Re: Efficiently delete rows based on conditions in an indexed table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Efficiently-delete-rows-based-on-conditions-in-an-indexed-table/m-p/459351#M284565</link>
      <description>&lt;P&gt;Just let us face it, Base SAS isn't that good when it comes to deletes in place, and with index updates in parallel.&lt;/P&gt;
&lt;P&gt;So, often it's more efficient to replace the table.&lt;/P&gt;
&lt;P&gt;SAS/SPDS has a fast path delete, and I assume it performs fairly good compare to Base/SPDE engines. But I guess you don't have that?&lt;/P&gt;</description>
      <pubDate>Wed, 02 May 2018 15:04:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Efficiently-delete-rows-based-on-conditions-in-an-indexed-table/m-p/459351#M284565</guid>
      <dc:creator>LinusH</dc:creator>
      <dc:date>2018-05-02T15:04:01Z</dc:date>
    </item>
    <item>
      <title>Re: Efficiently delete rows based on conditions in an indexed table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Efficiently-delete-rows-based-on-conditions-in-an-indexed-table/m-p/459534#M284566</link>
      <description>&lt;P&gt;I suggest you compare a DELETE with indexes versus a DELETE without indexes. I don't think IF statements use indexes so I don't think it will be faster with indexes, probably slower as the indexes have to be updated for deletions as well. I suspect an IF with DELETE and no indexes is quicker.&lt;/P&gt;</description>
      <pubDate>Thu, 03 May 2018 03:44:51 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Efficiently-delete-rows-based-on-conditions-in-an-indexed-table/m-p/459534#M284566</guid>
      <dc:creator>SASKiwi</dc:creator>
      <dc:date>2018-05-03T03:44:51Z</dc:date>
    </item>
    <item>
      <title>Re: Efficiently delete rows based on conditions in an indexed table co</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Efficiently-delete-rows-based-on-conditions-in-an-indexed-table/m-p/459569#M284567</link>
      <description>&lt;P&gt;The MODIFY statement is usually inefficient.&lt;/P&gt;
&lt;P&gt;Try this:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data LOOKUP;
  retain COUNTRY 'NZ' YEAR 2018 ;
run;
data MAIN;
  update MAIN LOOKUP(in=DEL);
  by COUNTRY YEAR;
  if DEL then delete;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 03 May 2018 08:49:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Efficiently-delete-rows-based-on-conditions-in-an-indexed-table/m-p/459569#M284567</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2018-05-03T08:49:19Z</dc:date>
    </item>
    <item>
      <title>Re: Efficiently delete rows based on conditions in an indexed table co</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Efficiently-delete-rows-based-on-conditions-in-an-indexed-table/m-p/459573#M284568</link>
      <description>&lt;P&gt;Or this:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data LOOKUP;
  set MAIN(obs=0);
run;
data MAIN;
  update MAIN LOOKUP;
  by COUNTRY YEAR;
  if COUNTRY = 'NZ' &amp;amp; YEAR = 2018 then delete;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 03 May 2018 08:58:58 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Efficiently-delete-rows-based-on-conditions-in-an-indexed-table/m-p/459573#M284568</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2018-05-03T08:58:58Z</dc:date>
    </item>
    <item>
      <title>Re: Efficiently delete rows based on conditions in an indexed table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Efficiently-delete-rows-based-on-conditions-in-an-indexed-table/m-p/459929#M284569</link>
      <description>&lt;P&gt;Is it&amp;nbsp; a composite index, as here? Then just use a MODIFY, followed by a WHERE followed by REMOVE. The index will be used, presumably saving a good deal of input operations:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have (index=(na=(name age)));
  set sashelp.class;
  do i=1 to 100*_n_;output;end;
run;

options msglevel=I;
data have;
  modify have;
  where name='John' and age=12;
  remove;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;If you just have&amp;nbsp; two simple indexes, choose one of them (the more&amp;nbsp; discriminating one) for the WHERE statement, followed by an&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&lt;STRONG&gt;IF... THEN REMOVE&lt;/STRONG&gt;&lt;/FONT&gt; statement conditioned on the other variable.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The MGLEVEL=I option tells SAS to report (among other things) when it is using the data set index.&lt;/P&gt;</description>
      <pubDate>Fri, 04 May 2018 04:35:42 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Efficiently-delete-rows-based-on-conditions-in-an-indexed-table/m-p/459929#M284569</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2018-05-04T04:35:42Z</dc:date>
    </item>
    <item>
      <title>Re: Efficiently delete rows based on conditions in an indexed table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Efficiently-delete-rows-based-on-conditions-in-an-indexed-table/m-p/460327#M284570</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/31461"&gt;@mkeintz&lt;/a&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;UPDATE can be much faster than MODIFY, depending on the proportion of the table that is deleted.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If we delete 10% of the table, the MODIFY step takes 3.8s on my machine:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;data HAVE (sortedby=I index=(I));
&amp;nbsp; do I=1 to 5e6;output;end;
run;
data HAVE;
&amp;nbsp; modify HAVE;
&amp;nbsp; where I&amp;lt;5e5;
&amp;nbsp; remove;
run;&lt;/PRE&gt;
&lt;P&gt;while the UPDATE step takes 0.8s:&lt;/P&gt;
&lt;PRE&gt;data HAVE (sortedby=I index=(I));
&amp;nbsp; do I=1 to 5e6;output;end;
run;
data LOOKUP;
&amp;nbsp; set HAVE(obs=0);
run;
data HAVE;
&amp;nbsp; update HAVE LOOKUP;
&amp;nbsp; by I;
&amp;nbsp; if I&amp;lt;5e5 then delete;
run;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If the proportion of the table to delete is small (or if the key is not unique), MODIFY becomes preferable *provided an index is available*.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 06 May 2018 22:15:51 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Efficiently-delete-rows-based-on-conditions-in-an-indexed-table/m-p/460327#M284570</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2018-05-06T22:15:51Z</dc:date>
    </item>
    <item>
      <title>Re: Efficiently delete rows based on conditions in an indexed table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Efficiently-delete-rows-based-on-conditions-in-an-indexed-table/m-p/460449#M284571</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/16961"&gt;@ChrisNZ&lt;/a&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I agree that proportion to be deleted impacts the performance of the MODIFY technique I suggested.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But when comparing it to the UPDATE technique, the width of the records in dataset HAVE is just as important, and at some point becomes more important in the MODIFY vs UPDATE contrast.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For instance, I added a single $200 byte satellite variable to dataset have and got 20.2 seconds for the UPDATE and 4.9 for the MODIFY.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data HAVE (sortedby=I index=(I));
  length x $200; retain x ' ';
  do I=1 to 5e6;output;end;
run;
data lookup ; set have (obs=0); run;
data HAVE (sortedby=I index=(i));
  update HAVE lookup;
  by I;
  if I&amp;lt;5e5 then delete;
run;

data HAVE (sortedby=I index=(I));
  length x $200; retain x ' ';
  do I=1 to 5e6;output;end;
run;
data HAVE;
  modify HAVE;
  where I&amp;lt;5e5;
  remove;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Also, while I think that cases where an index is available and delete-proportion is small, then MODIFY with WHERE is the way to go, I don't agree that presence of an index is required to choose MODIFY over UPDATE.&amp;nbsp; The program below, using MODIFY without benefit of the index against the same dataset as above, took&amp;nbsp;6 seconds on my machine.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data HAVE (sortedby=I index=(I));
  length x $200; retain x ' ';
  do I=1 to 5e6;output;end;
run;
data have;
  modify have;
  if I&amp;lt;5e5 then remove;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Proportion matters, but so does (horizontal) size.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Edited addition:&amp;nbsp;&amp;nbsp; The MODIFY statement preserves the index in have.&amp;nbsp;&amp;nbsp; And you could improve the performance of the UPDATE test I ran&amp;nbsp;by dropping the&amp;nbsp;index creation, but I suspect the OP would want to preserve the index.&lt;/P&gt;</description>
      <pubDate>Mon, 07 May 2018 14:57:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Efficiently-delete-rows-based-on-conditions-in-an-indexed-table/m-p/460449#M284571</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2018-05-07T14:57:31Z</dc:date>
    </item>
    <item>
      <title>Re: Efficiently delete rows based on conditions in an indexed table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Efficiently-delete-rows-based-on-conditions-in-an-indexed-table/m-p/460585#M284572</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/31461"&gt;@mkeintz&lt;/a&gt;&amp;nbsp;I can't disagree with your conclusion!&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Yes indeed, index&amp;nbsp;performance does not deteriorate with width, only&amp;nbsp;with height, whereas a join BY&amp;nbsp;is&amp;nbsp;very much impacted by width.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Very interesting. Thank you for your input.&lt;/P&gt;</description>
      <pubDate>Mon, 07 May 2018 22:35:44 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Efficiently-delete-rows-based-on-conditions-in-an-indexed-table/m-p/460585#M284572</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2018-05-07T22:35:44Z</dc:date>
    </item>
    <item>
      <title>Re: Efficiently delete rows based on conditions in an indexed table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Efficiently-delete-rows-based-on-conditions-in-an-indexed-table/m-p/460902#M284573</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/31461"&gt;@mkeintz&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The reason I ruled out using MODIFY for doing deletions is that in my experience, UPDATE is faster when a look up table is used, rather than a static value like we have here.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Do you have a different experience?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;data MAIN1(index=(I)) MAIN2 UPD; 
  do I=1 to 1e7; output MAIN1 MAIN2; if ranuni(1)&amp;gt;.9 then output UPD; end; 
run;

* hours? ;
data MAIN; 
  modify MAIN UPD; by I; if _IORC_=0 then remove; else output; 
run;

* 5 seconds;
data MAIN1;   
   set UPD; modify MAIN1 key=I; if _IORC_= %sysrc(_sok) then remove; else _ERROR_=0; 
run;

* 2 seconds;
data MAIN2;  
  update MAIN2 UPD(in=UPD); by I; if UPD then delete; 
run;
&lt;/PRE&gt;</description>
      <pubDate>Wed, 09 May 2018 02:37:35 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Efficiently-delete-rows-based-on-conditions-in-an-indexed-table/m-p/460902#M284573</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2018-05-09T02:37:35Z</dc:date>
    </item>
    <item>
      <title>Re: Efficiently delete rows based on conditions in an indexed table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Efficiently-delete-rows-based-on-conditions-in-an-indexed-table/m-p/460910#M284574</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/31461"&gt;@mkeintz&lt;/a&gt;&amp;nbsp;I answered my own question.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If the table widens (add a variable with length $200) or if the number of updates reduces (replace .9 with .99), MODIFY becomes faster.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;As usual... it depends...&amp;nbsp; &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;MODIFY BY is never even in the race though.&amp;nbsp; Only MODIFY KEY= should be used, if an index is present.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 09 May 2018 04:22:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Efficiently-delete-rows-based-on-conditions-in-an-indexed-table/m-p/460910#M284574</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2018-05-09T04:22:55Z</dc:date>
    </item>
    <item>
      <title>Re: Efficiently delete rows based on conditions in an indexed table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Efficiently-delete-rows-based-on-conditions-in-an-indexed-table/m-p/461026#M284575</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/16961"&gt;@ChrisNZ&lt;/a&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Yes MODIFY with BY is exceedingly slow, and I don't see why that has to be the case.&amp;nbsp; It eliminates a natural option in using table-driven record deletion/update while avoiding rewriting the complete original data set.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But you don't have to abandon modify to preserve performance while using table driven deletes.&amp;nbsp;&amp;nbsp;You just have to abandon use of BY.&amp;nbsp; I would suggest a hash table, as in:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data HAVE 
     deletes (keep=I);
  length x $200; retain x ' ';
  do I=1 to 5e6;
    output have; 
    if mod(i,100)=0 then output deletes;
  end;
run;

data have;
  if _n_=1 then do;
    declare hash h (dataset:'deletes');
      h.definekey('i');
      h.definedone();
  end;
  modify have ;
  if h.find()=0 then remove;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The advantage, once again, is that this minimizes disk write activity, which becomes more relevant with wide records.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Caveat:&amp;nbsp;&amp;nbsp;The above removes ALL records that match those in DELETES.&amp;nbsp; If the MODIFY with BY approach was fast enough to be used, it would only remove the first record matching each one in DELETES, as&amp;nbsp;does your UPDATES example.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;To exactly replicate the remove-only-first property, remove&amp;nbsp;an item from the hash object every time the corresponding record is removed from the dataset:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
  if _n_=1 then do;
    declare hash h (dataset:'deletes');
      h.definekey('i');
      h.definedone();
  end;
  modify have ;
  if h.find()=0 then do; 
    remove; 
    h.remove(); 
  end;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 09 May 2018 14:02:06 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Efficiently-delete-rows-based-on-conditions-in-an-indexed-table/m-p/461026#M284575</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2018-05-09T14:02:06Z</dc:date>
    </item>
    <item>
      <title>Re: Efficiently delete rows based on conditions in an indexed table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Efficiently-delete-rows-based-on-conditions-in-an-indexed-table/m-p/461254#M284576</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/31461"&gt;@mkeintz&lt;/a&gt;&amp;nbsp;Hash tables have never proven useful to me for this.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Even if you consider the time to build an index, and if you use the check() method, hash is still slower.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have1 have2 
     deletes (keep=I);
  length x $200; retain x ' ';
  do I=1 to 5e6;
    output have1 have2; 
    if mod(i,100)=0 then output deletes;
  end;
run;
 
data have1;
  if _n_=1 then do;
    declare hash h (dataset:'deletes');
      h.definekey('i');
      h.definedone();
  end;
  modify have1 ;
  if h.check()=0 then remove;
run;

proc sql; 
  create index I on have2;
data have2; 
  set deletes; 
  modify have2 key=I;  
  if _IORC_= %sysrc(_sok) then remove; else _ERROR_=0; 
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Hash:&amp;nbsp; real time 4.3 seconds&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Index+key=: 1.4 + 0.7 = 2.3&amp;nbsp;&lt;SPAN&gt;seconds&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 10 May 2018 05:44:58 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Efficiently-delete-rows-based-on-conditions-in-an-indexed-table/m-p/461254#M284576</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2018-05-10T05:44:58Z</dc:date>
    </item>
    <item>
      <title>Re: Efficiently delete rows based on conditions in an indexed table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Efficiently-delete-rows-based-on-conditions-in-an-indexed-table/m-p/461368#M284577</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/16961"&gt;@ChrisNZ&lt;/a&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Your example does not demonstrate a hash table vs index contrast.&amp;nbsp;&amp;nbsp; What&amp;nbsp;it does show is (when deleting uniformly distributed 1% of the cases), that&amp;nbsp;a MODIFY sequentially accessing the entire HAVE is slower than table-driven direct access MODIFY.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Having said that, I agree that hashing the deletes table&amp;nbsp;can be slower than sequentially reading it as a driver for modify.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Your example is&amp;nbsp;HAVE2 below, while the&amp;nbsp;proper hash analog is&amp;nbsp;HAVE3, both of which are faster than my non-indexed original example (HAVE1):&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;SET DELETES (in have2) is faster (0.9 seconds) than&lt;/LI&gt;
&lt;LI&gt;DECLARE HASH H (datasets:'deletes') in have3 as 2.4 seconds:&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data HAVE1 HAVE2 HAVE3
     deletes (keep=I);
  length x $200; retain x ' ';
  do I=1 to 5e6;
    output have1 have2 have3;
    if mod(i,100)=0 then output deletes;
  end;
run; 

data have1;
  if _n_=1 then do;
    declare hash h (dataset:'deletes');
      h.definekey('i');
      h.definedone();
  end;
  modify have1 ;
  if h.find()=0 then remove;
run;

proc sql; 
  create index I on have2;
data have2; 
  set deletes; 
  modify have2 key=I;  
  if _IORC_= %sysrc(_sok) then remove; else _ERROR_=0; 
run;

proc sql; 
  create index I on have3;
data have3;
  declare hash h (dataset:'deletes');
    h.definekey('i');
    h.definedone();
  declare hiter hi ('h');
  do rc=hi.first() while (rc=0);
    modify have3 key=I;  
    if _IORC_= %sysrc(_sok) then remove; else _ERROR_=0; 
    rc=hi.next();
  end;
  stop;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Editted&amp;nbsp;correction.&amp;nbsp; This code only deletes one record, as noted by &lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/16961"&gt;@ChrisNZ&lt;/a&gt;, because it doesn't iterate.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;do rc=hi.first() while (rc=0);
   modify have3 key=I;
   if _IORC_=%sysrc(_sok) then remove; else _ERROR_=0;
   rc=hi.next();
end;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;It should have a "by 0" in the DO statement to permit iteration and deletion of all the records:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;do rc=hi.first() by 0 while (rc=0);
   modify have3 key=I;
   if _IORC_=%sysrc(_sok) then remove; else _ERROR_=0;
   rc=hi.next();
end;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 11 May 2018 12:40:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Efficiently-delete-rows-based-on-conditions-in-an-indexed-table/m-p/461368#M284577</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2018-05-11T12:40:17Z</dc:date>
    </item>
    <item>
      <title>Re: Efficiently delete rows based on conditions in an indexed table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Efficiently-delete-rows-based-on-conditions-in-an-indexed-table/m-p/461463#M284578</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/31461"&gt;@mkeintz&lt;/a&gt;&lt;/P&gt;
&lt;P&gt;Your code for HAVE3 only deletes one record.&lt;/P&gt;
&lt;P&gt;To delete all targeted records, the hash table iterator is 10 times slower than the data step's implicit iterator.&lt;/P&gt;
&lt;PRE&gt;&lt;FONT size="2"&gt;data HAVE1 HAVE2 HAVE3
     deletes (keep=I);
  length x $200; retain x ' ';
  do I=1 to 5e6;
    output have1 have2 have3;
    if mod(i,100)=0 then output deletes;
  end;
run; 

data have1;
  if _n_=1 then do;
    declare hash h (dataset:'deletes');
      h.definekey('i');
      h.definedone();
  end;
  modify have1 ;
  if h.find()=0 then remove;
run;

proc sql; 
  create index I on have2;
data have2; 
  set deletes; 
  modify have2 key=I;  
  if _IORC_= %sysrc(_sok) then remove; else _ERROR_=0; 
run;

proc sql; 
  create index I on have3;
data have3;
  declare hash h (dataset:'deletes');
    h.definekey('i');
    h.definedone();
  declare hiter hi ('h');
  rc=hi.first() ;    
  do while (rc=0);    
    modify have3 key=I;  
    if _IORC_= %sysrc(_sok) then remove; else _ERROR_=0; 
    rc=hi.next();                   
  end;
  stop;
run;&lt;/FONT&gt;&lt;/PRE&gt;
&lt;P&gt;1- 4.7 s&lt;/P&gt;
&lt;P&gt;2- 1.6 + 0.8 s&lt;/P&gt;
&lt;P&gt;3- 1.6 + 7.3 s&lt;/P&gt;</description>
      <pubDate>Fri, 11 May 2018 03:38:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Efficiently-delete-rows-based-on-conditions-in-an-indexed-table/m-p/461463#M284578</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2018-05-11T03:38:25Z</dc:date>
    </item>
  </channel>
</rss>

