<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Deleting rows in a SAS large datasets in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Deleting-rows-in-a-SAS-large-datasets/m-p/400139#M97002</link>
    <description>&lt;P&gt;If you have a large dataset that is expensive to rewrite, you may want to find ways to rewrite (and re-index) infrequently,&amp;nbsp;say every 10th update, or every month for a dataset that is updated daily.&amp;nbsp; If so, then I'd consider the&amp;nbsp; &lt;EM&gt;&lt;STRONG&gt;proc sql ... delete&lt;/STRONG&gt;&lt;/EM&gt; option or the &lt;EM&gt;&lt;STRONG&gt;data ...modify...remove&lt;/STRONG&gt;&lt;/EM&gt; option, because they don't rewrite the dataset, they just mark observations (not variables)&amp;nbsp; to be deleted.&amp;nbsp; Then just occasionally use the &lt;EM&gt;&lt;STRONG&gt;data ... where&lt;/STRONG&gt;&lt;/EM&gt; approach to physically delete the removed observations.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;However, when using &lt;EM&gt;&lt;STRONG&gt;proc sql ...&amp;nbsp; delete&lt;/STRONG&gt;&lt;/EM&gt; or &lt;STRONG&gt;data ...modify...remove&lt;/STRONG&gt;, remember that the &lt;EM&gt;&lt;STRONG&gt;dataset attribute NOBS is not updated&lt;/STRONG&gt;&lt;/EM&gt;.&amp;nbsp;&amp;nbsp;In the example below,&amp;nbsp; NOBS=699&amp;nbsp;in both the DATA _NULL_ step preceding the modify and the one following modify,&amp;nbsp;but NLOBS (number of&amp;nbsp;logical obs)&amp;nbsp;goes from 699 to 487.&amp;nbsp; This is confirmed by&amp;nbsp;the&amp;nbsp;trailing PROC MEANS&amp;nbsp;reporting only 487 observations (and nmiss=0):&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
  set sashelp.stocks;
run;

data _null_;
  set have nobs=nrecs ;
  dsid=open('have');
  nlobs=attrn(dsid,'NLOBS');
  rc=close(dsid);
  put nrecs= nlobs= rc=;
  stop;
run;

data have;
  modify have;
  where high&amp;lt;60;
  remove;
run;

data _null_;
  set have nobs=nrecs ;
  dsid=open('have');
  nlobs=attrn(dsid,'NLOBS');
  rc=close(dsid);
  put nrecs= nlobs=;
  stop;
run;

proc means data=have n nmiss;
  var high;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Unfortunately the SET statement does not have an NLOBS= option, just the NOBS option.&amp;nbsp; That's why you see the open, attrn, and close functions in the DATA _NULL_ steps.&lt;/P&gt;</description>
    <pubDate>Sun, 01 Oct 2017 16:00:33 GMT</pubDate>
    <dc:creator>mkeintz</dc:creator>
    <dc:date>2017-10-01T16:00:33Z</dc:date>
    <item>
      <title>Deleting rows in a SAS large datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Deleting-rows-in-a-SAS-large-datasets/m-p/400107#M96982</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I would like to know which is the quickest way to delete rows in a large SAS dataset.&lt;BR /&gt;I have tested the ones of the above code. The quickest is the first one, but I want&lt;BR /&gt;to know if that is a general rule or it depends on the delete you are doing.&lt;/P&gt;
&lt;P&gt;Any other and better way to delete will be apprecciated.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data ventas;
set ventas;
if cod_centro in ('C1', 'C2') then delete;
run;

data ventas;
set ventas;
where cod_centro not in ('C1', 'C2') ;
run;

data ventas;
 modify ventas;
 if cod_centro in ('C1', 'C2') then remove;
run;

proc sql;
delete from ventas where cod_centro in ('C1', 'C2');
quit;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Any advice will be greatly appreciated&lt;/P&gt;</description>
      <pubDate>Sun, 01 Oct 2017 09:32:35 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Deleting-rows-in-a-SAS-large-datasets/m-p/400107#M96982</guid>
      <dc:creator>juanvg1972</dc:creator>
      <dc:date>2017-10-01T09:32:35Z</dc:date>
    </item>
    <item>
      <title>Re: Deleting rows in a SAS large datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Deleting-rows-in-a-SAS-large-datasets/m-p/400111#M96986</link>
      <description>&lt;P&gt;There are a lot of factors to be taken into account here - not least of which is whether or not the column you're deleting on is indexed or is the sort column. Therefore I don't think there's a general rule of thumb you can apply in all cases.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;There is another option you can try as well&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data ventas;
	set ventas(where=(cod_centro not in('C1', 'C2'));
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 01 Oct 2017 10:12:05 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Deleting-rows-in-a-SAS-large-datasets/m-p/400111#M96986</guid>
      <dc:creator>ChrisBrooks</dc:creator>
      <dc:date>2017-10-01T10:12:05Z</dc:date>
    </item>
    <item>
      <title>Re: Deleting rows in a SAS large datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Deleting-rows-in-a-SAS-large-datasets/m-p/400112#M96987</link>
      <description>&lt;P&gt;It looks like the WHERE approach (your second example) is best, according to section B in this document:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;A href="http://www.phusewiki.org/docs/2010/2010%20PAPERS/CC03%20Paper.pdf" target="_blank"&gt;http://www.phusewiki.org/docs/2010/2010%20PAPERS/CC03%20Paper.pdf&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 01 Oct 2017 10:14:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Deleting-rows-in-a-SAS-large-datasets/m-p/400112#M96987</guid>
      <dc:creator>Norman21</dc:creator>
      <dc:date>2017-10-01T10:14:22Z</dc:date>
    </item>
    <item>
      <title>Re: Deleting rows in a SAS large datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Deleting-rows-in-a-SAS-large-datasets/m-p/400130#M97000</link>
      <description>First rule out IF, it works only on data in the PDV.&lt;BR /&gt;WHERE on the other hand filter out in the input buffer. &lt;BR /&gt;Also, I had some bad experience with DELETE FROM, especially with SPDE data.&lt;BR /&gt;</description>
      <pubDate>Sun, 01 Oct 2017 13:12:03 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Deleting-rows-in-a-SAS-large-datasets/m-p/400130#M97000</guid>
      <dc:creator>LinusH</dc:creator>
      <dc:date>2017-10-01T13:12:03Z</dc:date>
    </item>
    <item>
      <title>Re: Deleting rows in a SAS large datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Deleting-rows-in-a-SAS-large-datasets/m-p/400139#M97002</link>
      <description>&lt;P&gt;If you have a large dataset that is expensive to rewrite, you may want to find ways to rewrite (and re-index) infrequently,&amp;nbsp;say every 10th update, or every month for a dataset that is updated daily.&amp;nbsp; If so, then I'd consider the&amp;nbsp; &lt;EM&gt;&lt;STRONG&gt;proc sql ... delete&lt;/STRONG&gt;&lt;/EM&gt; option or the &lt;EM&gt;&lt;STRONG&gt;data ...modify...remove&lt;/STRONG&gt;&lt;/EM&gt; option, because they don't rewrite the dataset, they just mark observations (not variables)&amp;nbsp; to be deleted.&amp;nbsp; Then just occasionally use the &lt;EM&gt;&lt;STRONG&gt;data ... where&lt;/STRONG&gt;&lt;/EM&gt; approach to physically delete the removed observations.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;However, when using &lt;EM&gt;&lt;STRONG&gt;proc sql ...&amp;nbsp; delete&lt;/STRONG&gt;&lt;/EM&gt; or &lt;STRONG&gt;data ...modify...remove&lt;/STRONG&gt;, remember that the &lt;EM&gt;&lt;STRONG&gt;dataset attribute NOBS is not updated&lt;/STRONG&gt;&lt;/EM&gt;.&amp;nbsp;&amp;nbsp;In the example below,&amp;nbsp; NOBS=699&amp;nbsp;in both the DATA _NULL_ step preceding the modify and the one following modify,&amp;nbsp;but NLOBS (number of&amp;nbsp;logical obs)&amp;nbsp;goes from 699 to 487.&amp;nbsp; This is confirmed by&amp;nbsp;the&amp;nbsp;trailing PROC MEANS&amp;nbsp;reporting only 487 observations (and nmiss=0):&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
  set sashelp.stocks;
run;

data _null_;
  set have nobs=nrecs ;
  dsid=open('have');
  nlobs=attrn(dsid,'NLOBS');
  rc=close(dsid);
  put nrecs= nlobs= rc=;
  stop;
run;

data have;
  modify have;
  where high&amp;lt;60;
  remove;
run;

data _null_;
  set have nobs=nrecs ;
  dsid=open('have');
  nlobs=attrn(dsid,'NLOBS');
  rc=close(dsid);
  put nrecs= nlobs=;
  stop;
run;

proc means data=have n nmiss;
  var high;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Unfortunately the SET statement does not have an NLOBS= option, just the NOBS option.&amp;nbsp; That's why you see the open, attrn, and close functions in the DATA _NULL_ steps.&lt;/P&gt;</description>
      <pubDate>Sun, 01 Oct 2017 16:00:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Deleting-rows-in-a-SAS-large-datasets/m-p/400139#M97002</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2017-10-01T16:00:33Z</dc:date>
    </item>
  </channel>
</rss>

