BookmarkSubscribeRSS Feed
juanvg1972
Pyrite | Level 9

Hi,

 

I would like to know which is the quickest way to delete rows in a large SAS dataset.
I have tested the ones of the above code. The quickest is the first one, but I want
to know if that is a general rule or it depends on the delete you are doing.

Any other and better way to delete will be apprecciated.

 

 

data ventas;
set ventas;
if cod_centro in ('C1', 'C2') then delete;
run;

data ventas;
set ventas;
where cod_centro not in ('C1', 'C2') ;
run;

data ventas;
 modify ventas;
 if cod_centro in ('C1', 'C2') then remove;
run;

proc sql;
delete from ventas where cod_centro in ('C1', 'C2');
quit;

Any advice will be greatly appreciated

4 REPLIES 4
ChrisBrooks
Ammonite | Level 13

There are a lot of factors to be taken into account here - not least of which is whether or not the column you're deleting on is indexed or is the sort column. Therefore I don't think there's a general rule of thumb you can apply in all cases.

 

There is another option you can try as well

 

data ventas;
	set ventas(where=(cod_centro not in('C1', 'C2'));
run;

 

Norman21
Lapis Lazuli | Level 10

It looks like the WHERE approach (your second example) is best, according to section B in this document:

 

http://www.phusewiki.org/docs/2010/2010%20PAPERS/CC03%20Paper.pdf

 

 

Norman.
SAS 9.4 (TS1M6) X64_10PRO WIN 10.0.17763 Workstation

LinusH
Tourmaline | Level 20
First rule out IF, it works only on data in the PDV.
WHERE on the other hand filter out in the input buffer.
Also, I had some bad experience with DELETE FROM, especially with SPDE data.
Data never sleeps
mkeintz
PROC Star

If you have a large dataset that is expensive to rewrite, you may want to find ways to rewrite (and re-index) infrequently, say every 10th update, or every month for a dataset that is updated daily.  If so, then I'd consider the  proc sql ... delete option or the data ...modify...remove option, because they don't rewrite the dataset, they just mark observations (not variables)  to be deleted.  Then just occasionally use the data ... where approach to physically delete the removed observations.

 

However, when using proc sql ...  delete or data ...modify...remove, remember that the dataset attribute NOBS is not updated.  In the example below,  NOBS=699 in both the DATA _NULL_ step preceding the modify and the one following modify, but NLOBS (number of logical obs) goes from 699 to 487.  This is confirmed by the trailing PROC MEANS reporting only 487 observations (and nmiss=0):

 

data have;
  set sashelp.stocks;
run;

data _null_;
  set have nobs=nrecs ;
  dsid=open('have');
  nlobs=attrn(dsid,'NLOBS');
  rc=close(dsid);
  put nrecs= nlobs= rc=;
  stop;
run;

data have;
  modify have;
  where high<60;
  remove;
run;

data _null_;
  set have nobs=nrecs ;
  dsid=open('have');
  nlobs=attrn(dsid,'NLOBS');
  rc=close(dsid);
  put nrecs= nlobs=;
  stop;
run;

proc means data=have n nmiss;
  var high;
run;

 

 

Unfortunately the SET statement does not have an NLOBS= option, just the NOBS option.  That's why you see the open, attrn, and close functions in the DATA _NULL_ steps.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 4942 views
  • 5 likes
  • 5 in conversation