<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: The most efficient way of deleting observations from huge table. in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/The-most-efficient-way-of-deleting-observations-from-huge-table/m-p/191857#M36166</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Yep it should be converted that way into the latest SAS version you are using now. I guessing that real old lost and forgotten one is made by an old SAS version.&amp;nbsp; &lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Wed, 21 Jan 2015 15:57:04 GMT</pubDate>
    <dc:creator>jakarman</dc:creator>
    <dc:date>2015-01-21T15:57:04Z</dc:date>
    <item>
      <title>The most efficient way of deleting observations from huge table.</title>
      <link>https://communities.sas.com/t5/SAS-Programming/The-most-efficient-way-of-deleting-observations-from-huge-table/m-p/191854#M36163</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi all,&lt;/P&gt;&lt;P&gt;Have general question.&lt;/P&gt;&lt;P&gt;I need to delete ~70% of observations from huge tables,millions of obs., the bigger has ~100 gigs, rest ~30 gig.&lt;/P&gt;&lt;P&gt;Table contains snapshots of data, by separate dates. "Date" column has needed index.Table also conatins a few diff. indexes by diff columns.&lt;/P&gt;&lt;P&gt;So most obvious variant - delete by index column, smth like:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG style="color: navy; font-size: 10pt; background: white; font-family: 'Courier New';"&gt;proc&lt;/STRONG&gt; &lt;STRONG style="color: navy; font-size: 10pt; background: white; font-family: 'Courier New';"&gt;sql&lt;/STRONG&gt;&lt;SPAN style="font-family: 'Courier New'; background: white; color: black; font-size: 10pt;"&gt;;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;SPAN style="font-family: 'Courier New'; background: white; color: blue; font-size: 10pt;"&gt;delete&lt;/SPAN&gt;&lt;SPAN style="font-family: 'Courier New'; background: white; color: black; font-size: 10pt;"&gt; * &lt;/SPAN&gt;&lt;SPAN style="font-family: 'Courier New'; background: white; color: blue; font-size: 10pt;"&gt;from&lt;/SPAN&gt;&lt;SPAN style="font-family: 'Courier New'; background: white; color: black; font-size: 10pt;"&gt; tabname&lt;/SPAN&gt;&amp;nbsp;&amp;nbsp; &lt;SPAN style="font-family: 'Courier New'; background: white; color: blue; font-size: 10pt;"&gt;where&lt;/SPAN&gt;&lt;SPAN style="font-family: 'Courier New'; background: white; color: black; font-size: 10pt;"&gt; dato&amp;lt;&amp;amp;cutDate;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG style="color: navy; font-size: 10pt; background: white; font-family: 'Courier New';"&gt;quit&lt;/STRONG&gt;&lt;SPAN style="font-family: 'Courier New'; background: white; color: black; font-size: 10pt;"&gt;;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: 'Courier New'; background: white; color: black; font-size: 10pt;"&gt;Also looks like efficient variant should be:&lt;/SPAN&gt;&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;&lt;SPAN style="font-family: 'Courier New'; background: white; color: black; font-size: 10pt;"&gt;Select needed records(~30%), put them to separate new table in same library.&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN style="font-family: 'Courier New'; background: white; color: black; font-size: 10pt;"&gt;Drop old huge table.&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN style="font-family: 'Courier New'; background: white; color: black; font-size: 10pt;"&gt;Rename table created on first point to needed table.&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN style="font-family: 'Courier New'; background: white; color: black; font-size: 10pt;"&gt;Recreate indexes.&lt;/SPAN&gt;&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: 'Courier New'; background: white; color: black; font-size: 10pt;"&gt;So maybe someone have experience with &lt;/SPAN&gt;similar tasks, maybe there are some efficient procedure that doesn't change tables bufsize and delete records more efficient then simple delete?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 21 Jan 2015 14:43:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/The-most-efficient-way-of-deleting-observations-from-huge-table/m-p/191854#M36163</guid>
      <dc:creator>Yura2301</dc:creator>
      <dc:date>2015-01-21T14:43:45Z</dc:date>
    </item>
    <item>
      <title>Re: The most efficient way of deleting observations from huge table.</title>
      <link>https://communities.sas.com/t5/SAS-Programming/The-most-efficient-way-of-deleting-observations-from-huge-table/m-p/191855#M36164</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;All depends on the physical location of that data. With 100Gb I would not expect a classic SAS-dataset on a single spindle location.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Suppose you have the data locates on a dedicated external RDBMS and you need first to validate &amp;amp;cutdate. Than a view defined in that external dbms could be a good fit.&lt;/P&gt;&lt;P&gt;When going to explore that heavily you can materialize that data in a new table inside that external RDBMS.&lt;/P&gt;&lt;P&gt;Suppose you want that 30% needing it local in your SAS environment than you can copy that (create new table).&lt;/P&gt;&lt;P&gt;In this approach while proceeding your work you are getting smaller data and are able increase analytic processing.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Dropping tables is deleting all of the data, it set you loose on going back in some steps.&lt;/P&gt;&lt;P&gt;Deleting rows can time consuming because of locking - random IO access.&lt;/P&gt;&lt;P&gt;Creating a new table (sas-dataset) using Proc SQL is able to take advantage of threading. Disadvantage can be random IO (most time consuming)&lt;/P&gt;&lt;P&gt;Creating a new table (sas-dataset) using a datastep can take advantage of minimizing IO-time. But is working sequentially one thread.&lt;/P&gt;&lt;P&gt;With spinning harddisks (no SSD available) some trade off is normally about 20% of the data needed.&amp;nbsp; Above that the sas datastep is faster below the proc sql.&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 21 Jan 2015 15:24:44 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/The-most-efficient-way-of-deleting-observations-from-huge-table/m-p/191855#M36164</guid>
      <dc:creator>jakarman</dc:creator>
      <dc:date>2015-01-21T15:24:44Z</dc:date>
    </item>
    <item>
      <title>Re: The most efficient way of deleting observations from huge table.</title>
      <link>https://communities.sas.com/t5/SAS-Programming/The-most-efficient-way-of-deleting-observations-from-huge-table/m-p/191856#M36165</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi Jaap,&lt;/P&gt;&lt;P&gt;Actually that 100 gig. table isn't external RDBMS table, it's real old, lost and forgotten SAS table that I need to cut as quick as possible:).&lt;/P&gt;&lt;P&gt;Please corrrect me if I understood your idea incorrect:&lt;/P&gt;&lt;P&gt;So you advices to create new smaller(30%) table using data step, instead of proc sql as I wrote in my post higher?&lt;/P&gt;&lt;P&gt;So final code will looks like:&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;1.&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;data table_cutted;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; set table(where =(dato&amp;gt;&amp;amp;cutDate));&lt;/P&gt;&lt;P&gt;run;&lt;BR /&gt;&lt;STRONG&gt;2.&lt;/STRONG&gt;Rename:&lt;/P&gt;&lt;P&gt;&lt;SPAN style="color: #008000; font-size: 10pt; font-family: Courier New;"&gt;&amp;nbsp; &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;proc datasets lib=...;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;change table_cutted=table;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;quit;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;3.&lt;/STRONG&gt;Recreate indexes constrants&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 21 Jan 2015 15:46:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/The-most-efficient-way-of-deleting-observations-from-huge-table/m-p/191856#M36165</guid>
      <dc:creator>Yura2301</dc:creator>
      <dc:date>2015-01-21T15:46:39Z</dc:date>
    </item>
    <item>
      <title>Re: The most efficient way of deleting observations from huge table.</title>
      <link>https://communities.sas.com/t5/SAS-Programming/The-most-efficient-way-of-deleting-observations-from-huge-table/m-p/191857#M36166</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Yep it should be converted that way into the latest SAS version you are using now. I guessing that real old lost and forgotten one is made by an old SAS version.&amp;nbsp; &lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 21 Jan 2015 15:57:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/The-most-efficient-way-of-deleting-observations-from-huge-table/m-p/191857#M36166</guid>
      <dc:creator>jakarman</dc:creator>
      <dc:date>2015-01-21T15:57:04Z</dc:date>
    </item>
    <item>
      <title>Re: The most efficient way of deleting observations from huge table.</title>
      <link>https://communities.sas.com/t5/SAS-Programming/The-most-efficient-way-of-deleting-observations-from-huge-table/m-p/191858#M36167</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi Jaap,&lt;/P&gt;&lt;P&gt;Just found one more impoovment that speed-up deleteing in 4!!! times in my case.&lt;/P&gt;&lt;P&gt;So generally I used same flow as we discuss higher, but before first step I reset size of buffer to maximal in SAS session:&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: 'Courier New'; background: white; color: blue; font-size: 10pt;"&gt;options&lt;/SPAN&gt;&lt;SPAN style="font-family: 'Courier New'; background: white; color: black; font-size: 10pt;"&gt; bufno=&lt;/SPAN&gt;&lt;STRONG style="color: teal; font-size: 10pt; background: white; font-family: 'Courier New';"&gt;1&lt;/STRONG&gt;&lt;SPAN style="font-family: 'Courier New'; background: white; color: black; font-size: 10pt;"&gt; bufsize=max;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;And that options speed-up quering those 30% of data in 4 times, but obviously results table become bigger in size due maxbuff option.&lt;/P&gt;&lt;P&gt;So after that I roll back previous buffer size on created data set(from max to usual 4096), and that operation takes only 2 minutes, quick, and the table size was decreased and become absolutely the same if I wouldn't use maxbuff option.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;For now I don't see minuses of such step, so probably will use &lt;STRONG&gt;bufsize=max &lt;/STRONG&gt;options in similar cases.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 22 Jan 2015 11:51:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/The-most-efficient-way-of-deleting-observations-from-huge-table/m-p/191858#M36167</guid>
      <dc:creator>Yura2301</dc:creator>
      <dc:date>2015-01-22T11:51:39Z</dc:date>
    </item>
    <item>
      <title>Re: The most efficient way of deleting observations from huge table.</title>
      <link>https://communities.sas.com/t5/SAS-Programming/The-most-efficient-way-of-deleting-observations-from-huge-table/m-p/191859#M36168</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Makes sense as increasing the bufsize will minimize IO. The limits of the bufsize are dependent of the OS&amp;nbsp; and SAS version.&lt;/P&gt;&lt;P&gt;The OS is often have already some caching as the same with the spinning hard-disk. (A SSD is not not using a spining device).&lt;/P&gt;&lt;P&gt;As of SAS 9.3 latest version you may be can use SASAlignIofiles to improve the process between SAS and the IO system&lt;/P&gt;&lt;P&gt;&lt;A href="http://support.sas.com/documentation/cdl/en/hostunx/67464/HTML/default/viewer.htm#n0gd56c6invdzan1b4shj0g7h726.htm" title="http://support.sas.com/documentation/cdl/en/hostunx/67464/HTML/default/viewer.htm#n0gd56c6invdzan1b4shj0g7h726.htm"&gt;SAS(R) 9.4 Companion for UNIX Environments, Fourth Edition&lt;/A&gt; The max bufsize is 2Gb&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;A bufzise of 4K is really small only advicable for small datasets. That setting is for minimizing spilled space in the last buffer/block.&lt;/P&gt;&lt;P&gt;Even in the old mainframe days I preferred the 27K setting (half track) for this speed reason.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;There is an disadvantage with that big bufsize. That is at the moment you are going to use random access (indexed).&lt;/P&gt;&lt;P&gt;Hitting random a buffer somewhere will need to load/reload those pages of 2Gb into memory even if your&amp;nbsp; record is only 1Mb. (that is a big record) &lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 22 Jan 2015 12:15:28 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/The-most-efficient-way-of-deleting-observations-from-huge-table/m-p/191859#M36168</guid>
      <dc:creator>jakarman</dc:creator>
      <dc:date>2015-01-22T12:15:28Z</dc:date>
    </item>
    <item>
      <title>Re: The most efficient way of deleting observations from huge table.</title>
      <link>https://communities.sas.com/t5/SAS-Programming/The-most-efficient-way-of-deleting-observations-from-huge-table/m-p/191860#M36169</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi again Jaap,&lt;/P&gt;&lt;P&gt;Regarding your last descriptions about disadvantages:- I actually roll back previous bufsize afterwords, and for my surprize on 6 gigabytes table such operation takes just a few minutes.&lt;/P&gt;&lt;P&gt;But you are right regarding minuses of big bufsize, thanks for good explanations.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 22 Jan 2015 12:47:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/The-most-efficient-way-of-deleting-observations-from-huge-table/m-p/191860#M36169</guid>
      <dc:creator>Yura2301</dc:creator>
      <dc:date>2015-01-22T12:47:18Z</dc:date>
    </item>
  </channel>
</rss>

