- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have millions of rows of data in a table, how do I delete the data without deleting the tables
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
There are several methods, I don't off the top of my head know which will be best resourcewise:
proc sql; delete from HAVE where <condition>; quit; data want; set have; if <condition>; run; data want; set have; where <condition>; run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
There are several methods, I don't off the top of my head know which will be best resourcewise:
proc sql; delete from HAVE where <condition>; quit; data want; set have; if <condition>; run; data want; set have; where <condition>; run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Overwriting it would be the fastest.
data TABLE;
set TABLE;
stop;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Assuming we're talking about tables stored as SAS files (SAS tables as opposed to database tables).
You can use PROC SQL / DELETE or less commonly use a SAS Datastep Modify / Remove.
The issue with SAS tables:
The records don't get physically removed but get just flagged as deleted. The deleted records still require all the storage space.
You can see this when using Proc Contents.
data have;
do i=1 to 3;output;end;
run;
proc sql;
delete from have where i>1
;
quit;
proc contents data=have;
run;quit;
If your permanent SAS dataset gets later on Inserts or Updates then the space of the deleted observations gets re-used as far as I understand things.
IF your SAS dataset is also compressed then make sure that it has been created with option REUSE=YES or you'll never get rid of the deleted observations in regards of space usage.
Given all these challenges it's actually often more efficient to just re-write the whole SAS table. ...But of course if you're only deleting a few records in a big table or you have also indices and/or constraints defined on the table then that's probably not what you want to do.
....and for all of the above reasons if it comes to production implementations and bigger data volumes then my preference is to store and maintain data in a database as there the DBMS deals with all this stuff.