07-06-2015 07:08 AM
I am trying to make a delete in the most quickest way.
I have a delete with a simple where in which I delete 50% of the table.
I have testes two ways:
if base = 'N' then delete;
delete from ventas_tot where base = 'N';
The 1) way is the faster, but os there any other way yo make a delete as fast as possible.
I don't hace any index in the table
07-06-2015 07:27 AM
With SAS tables your option 1) doesn't actually delete the table but it re-creates it with records where your "delete" condition isn't met. You could also write this like:
set ventas_tot(where=(base ne 'N'));
Your option 2) is an actual delete. The catch here: It doesn't really physically delete the records but only marks them as deleted (logical delete). This process is not only slower it also doesn't reduce the physical storage space of your table. So option 1) is "better".
delete from test where name='Alfred';
proc contents data=test;
Data Set Name WORK.TEST Observations 18
Member Type DATA Variables 5
Engine V9 Indexes 0
Created Observation Length 40
Last Modified Deleted Observations 1
07-06-2015 07:30 AM
A brief few tests on SAS 9.3 shows your first way seems to be the quickest, test code below. TBH I was quite suprised, I would have thought the SQL would have been quicker, or even the where clause on the read in, but no, slower.
data test1 test2 test3;
do j=1 to 1000000;
if i="b" then delete;
set test2 (where=(i ne "b"));
delete from test3 where i="b";
07-06-2015 07:55 AM
Tye this one, which use randomly access way .
if sex='F' then remove;
07-06-2015 09:59 AM
Works, but probably not very efficient when you have 50% deletion rate.
Also, I'm not surprised that PROC SQL DELETE is slower than a table scan/rewrite. I don'r know in details what DELETE does but I got a feeling it updates in place, at least logically, to set the delete flag. A rewrite using WHERE does not have to consider marking records for deletion, AND does not even have to read observations to be deleted into the PDV, and off course not to write them back.
07-06-2015 09:49 AM
Your #1 is already the best solution. Depending on the layout and size of your dataset, compressing it may help (trade CPU cycles for reduced I/O).