DATA Step, Macro, Functions and more

Delete obs efficiently

Reply
Contributor
Posts: 47

Delete obs efficiently

I create a dataset named T with an index date,

data t(index=(date));

  do i=1 to 1000000;

  date=intnx('day','01jan2010'd,i);

  x=1;y=2222222;z=33333;

  output;

  end;

  format date yymmdd10.;

run;

The following two pieces of codes are to delete obs efficiently using indexes, but after submitting, I find the size of physicial file of table T don't change(perhaps called logically delete).

The reason why I use the following is that they do not create a new copy of the data set and delete indexes, save time of re-creating, but if logically delete, the table size will become bigger an bigger, and the IO time will increase rapidly.

How to balance?

data t;

  modify t;

  if date<='01feb3050'd then remove t;

run;

proc sql;

  delete from t where date<='01feb3050'd;

quit;

Super User
Posts: 5,441

Re: Delete obs efficiently

Yes, the deletes as logical.

If you wish to save space, use the REUSE= data set option (only valid with COMPRESS=YES for some reason...?).

Be aware that the table can be fragmented and less efficient for querying.

Data never sleeps
Ask a Question
Discussion stats
  • 1 reply
  • 139 views
  • 0 likes
  • 2 in conversation