SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Deleting rows in SAS DIS

Accepted Solution Solved
Reply
Contributor
Posts: 37
Accepted Solution

Deleting rows in SAS DIS

[ Edited ]

Hi all,

 

What do you think is the best practice to delete rows from a SAS table in SAS DIS? I need to delete historical data but with delete statement we don't save any space.

 

Does it work to create a job where we extract (extract transformation) the data we want to keep and then load it to the same table (table loader transformation)?

 

Any suggestions are welcome.

 

 

Thanks,

 

May15.


Accepted Solutions
Solution
‎02-13-2018 10:41 AM
SAS Employee
Posts: 272

Re: Deleting rows in SAS DIS

[ Edited ]

Hi @May15

 

If you are talking about deleting observations from a SAS data set this might help. 

 

SAS uses lazy deletes. This means that the observation in the file is marked as deleted but it remains in the file. This is why the delete statement has no affect on the size of the SAS data set. If your file is really small you may not see a reduction in size, at all. The only way that I know of to reduce the size of the file is to make a copy of it using a SAS tool (PROC COPY, DATA step, PROC SQL, etc). 

 

For example

data small;
   x=1;
run;

This code results in a file that is 128K. Deleting this single row, then copying the SAS data set via a SAS tool, will not change the size of the file because it is so small. 

 

Using DI Studio you could:

  1. Create a new file containing the data you want (smaller than the original)
  2. Delete the original SAS data set (if you want to play it safe this file can be renamed instead of deleted)
  3. Rename the new data set to the old name

Compressed SAS datasets behave a little differently. Here's a SAS Note with more details: http://support.sas.com/kb/32/042.html

 

Good luck,

Jeff

 

View solution in original post


All Replies
Super User
Posts: 5,851

Re: Deleting rows in SAS DIS

From DIS perspective your suggestion would probably work, just your exact need to result in a table rather than a view (otherwise you are trying to write to a tabel atbthe same time reading from it).
If it's a good idea depends on your constraints (disk, up, processing time).
Data never sleeps
Contributor
Posts: 37

Re: Deleting rows in SAS DIS

Hi LinusH, 

 

Thanks for your reply.

 

I've tried but it doesn't work. I've got the following error:

 

ERROR: Attempt to append a view to itself.

 

However, it results in a table, not in a view. I don't know if I understood the restriction you've said.

 

 

Thanks,

 

May 15

Super User
Posts: 5,851

Re: Deleting rows in SAS DIS

I don't know either. Show a view of your job an the code generated for the extract transformation.
Data never sleeps
Contributor
Posts: 37

Re: Deleting rows in SAS DIS

Hi LinusH,

 

I've found the issue... The extract transformation had the option "Create as view" checked. I unchecked this and now it works.

 

By the way, using this approach, we don't lose any constraints and indexes of the data, right?

 

 

Thanks,

 

May15

Solution
‎02-13-2018 10:41 AM
SAS Employee
Posts: 272

Re: Deleting rows in SAS DIS

[ Edited ]

Hi @May15

 

If you are talking about deleting observations from a SAS data set this might help. 

 

SAS uses lazy deletes. This means that the observation in the file is marked as deleted but it remains in the file. This is why the delete statement has no affect on the size of the SAS data set. If your file is really small you may not see a reduction in size, at all. The only way that I know of to reduce the size of the file is to make a copy of it using a SAS tool (PROC COPY, DATA step, PROC SQL, etc). 

 

For example

data small;
   x=1;
run;

This code results in a file that is 128K. Deleting this single row, then copying the SAS data set via a SAS tool, will not change the size of the file because it is so small. 

 

Using DI Studio you could:

  1. Create a new file containing the data you want (smaller than the original)
  2. Delete the original SAS data set (if you want to play it safe this file can be renamed instead of deleted)
  3. Rename the new data set to the old name

Compressed SAS datasets behave a little differently. Here's a SAS Note with more details: http://support.sas.com/kb/32/042.html

 

Good luck,

Jeff

 

Contributor
Posts: 37

Re: Deleting rows in SAS DIS

Hi JBailey,

 

Thanks for your tips. In fact, it's a shame that the delete transformation doesn't save any space.

 

 

 

Thanks,

 

May15

SAS Employee
Posts: 272

Re: Deleting rows in SAS DIS

Hi @May15

 

My pleasure. Hopefully it answers the question. 

 

The transformation only issues the DELETE code and it must work regardless of data source. The clearing of "slack" space in a SAS data set could be set up in a different process and run periodically. 

 

Best wishes,
Jeff

Contributor
Posts: 37

Re: Deleting rows in SAS DIS

Hi @JBailey,

 

Let me ask, what yould you suggest to clear the "slack" space in a SAS data set?

 

 

Thanks,

 

May15

Trusted Advisor
Posts: 1,312

Re: Deleting rows in SAS DIS

Why is it "a shame that the delete transformation doesn't save any space"?

 

Consider a dataset with 100M observations stored in,say, 5M physical data set disk pages (of 64K bytes each).  You delete the first observation, requiring the reading and re-writing of all 5M pages.  This is no different than a PROC SQL, DATA step, PROC COPY etc.  Logical deletion is the superior way to go in such a case.  Yes, at some point if enough obs have been marked as deleted, it might make send to copy to a smaller disk usage, but making that behavior automatic is not necessarily cost effective.

 

I suppose some data bases offer an immediate reduction in disk storage with every row eliminated, but I imagine to do so, those databases are all accessed via indexes.  I suspect they do not offer the rapid sequential read capabilities of SAS.

 

 

Super User
Posts: 5,851

Re: Deleting rows in SAS DIS

That depends on the settings in the Table Loader.
But test the job and analyze results, as a normal development and unit test.
Data never sleeps
Contributor
Posts: 37

Re: Deleting rows in SAS DIS

Sure LinusH.

 

Thanks,

 

May15

Super User
Posts: 5,851

Re: Deleting rows in SAS DIS

You can have it defined with the REUSE data set option. Will save some space but eventually over time the internal storage will be scattered, so a total rewrite of the table now and then is recommended.
Data never sleeps
Contributor
Posts: 37

Re: Deleting rows in SAS DIS

[ Edited ]

Hi all again,

 

Unfortunately, the idea I've posted in the original post isn't working. I'm getting the following errors:

 

 

WARNING: Target table is referenced by constraints in another table
WARNING: Replacing entire table will fail. Consider an alternate load technique or revising constraints.
ERROR: A rename/delete/replace attempt is not allowed for a data set involved in a referential integrity constraint.
 
 
In fact, this table has a primary key which is related to a foreign key in another table and also has a foreign key that is related to a primary key in another table.
Just to remember, in order to save space, the process is to extract only the data I want and then load it to the same table. The load process was done with the table loader transformatin using the "Replace" style and with the default constraint conditions:
 
Before load: On table creation
After load: Leave on
 
 
Any help to solve this issue??
 
 
 
Thanks,
 
May15
SAS Employee
Posts: 272

Re: Deleting rows in SAS DIS

[ Edited ]

Extracting the rows and writing them back to the same table (really a SAS data set, I assume) won't help you because in order to make the file smaller it must be recreated using a SAS DATA step, PROC APPEND, etc. This removes the unused space in the file. If you recreate the file using COMPRESS=yes and REUSE=yes then space will be reused. But as Linus points out, it is a good idea to periodically recreate the file to clean the space up. 

 

The referential integrity can be dropped while maintenance is performed on the table. Once the work is done, the referential integrity can be reapplied. Keep in mind, if the data violates the constrains it will cause an error when reapplied.

 

I have been playing around with some code that shows how to make a file smaller. Hopefully, it will help somebody. I plan to take it and create a communities article on it at some point. In DI Studio you can use the APPEND transform to do a lot of this. I assume there is a way to rename a SAS data set; I don't know which transform will do that.

 

libname prod "C:\sasdata\prod";
libname stage "C:\sasdata\stage";

/* Create a 1GB SAS data set */
data prod.slackspace;
   do i = 1 to 10000000;
      x='abcdefghijklmnopqrstuvwxyz0123456789';
      y='abcdefghijklmnopqrstuvwxyz0123456789';
      z='abcdefghijklmnopqrstuvwxyz0123456789';
      output;
   end;
run;

/* DELETE all the observations from the 1GB SAS data set to see if it gets smaller */
proc sql;
   delete from prod.slackspace;
   select count(*) from prod.slackspace;
quit;

/* At this point the select count(*) returns 0 */
/* The file is still 1GB */ /* CREATE TABLE AS to see if the new SAS data set is smaller - it will be 128K */ proc sql; create table stage.noslack_sql as select * from prod.slackspace; run; /* Create new table using PROC APPEND to see if the new SAS data set is smaller */ proc append base=stage.noslack_append data=prod.slackspace; run; /* Create new table using a DATA step to see if the new SAS data set is smaller */ data stage.noslack_datastep; set prod.slackspace; run; /* Verify the sizes */ proc datasets lib=prod; run; proc datasets lib=stage; run; /* For production... */ /* Remove wasted space */ /* Use two libraries pointing to different devices for performance.*/
/* This prevents simultaneous reads and writes to the same device. */
/* Ignore the fact that I am on a laptop;)
proc append base=stage.slackspace_small data=prod.slackspace; run; /* move smaller file back into production - give it the original name */ proc datasets lib=prod; change slackspace=slackspace_old; copy in=stage out=prod move; change slackspace_small=slackspace; run;

Best wishes,

Jeff

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 18 replies
  • 556 views
  • 3 likes
  • 4 in conversation