BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
May15
Obsidian | Level 7

Hi all,

 

What do you think is the best practice to delete rows from a SAS table in SAS DIS? I need to delete historical data but with delete statement we don't save any space.

 

Does it work to create a job where we extract (extract transformation) the data we want to keep and then load it to the same table (table loader transformation)?

 

Any suggestions are welcome.

 

 

Thanks,

 

May15.

1 ACCEPTED SOLUTION

Accepted Solutions
JBailey
Barite | Level 11

Hi @May15

 

If you are talking about deleting observations from a SAS data set this might help. 

 

SAS uses lazy deletes. This means that the observation in the file is marked as deleted but it remains in the file. This is why the delete statement has no affect on the size of the SAS data set. If your file is really small you may not see a reduction in size, at all. The only way that I know of to reduce the size of the file is to make a copy of it using a SAS tool (PROC COPY, DATA step, PROC SQL, etc). 

 

For example

data small;
   x=1;
run;

This code results in a file that is 128K. Deleting this single row, then copying the SAS data set via a SAS tool, will not change the size of the file because it is so small. 

 

Using DI Studio you could:

  1. Create a new file containing the data you want (smaller than the original)
  2. Delete the original SAS data set (if you want to play it safe this file can be renamed instead of deleted)
  3. Rename the new data set to the old name

Compressed SAS datasets behave a little differently. Here's a SAS Note with more details: http://support.sas.com/kb/32/042.html

 

Good luck,

Jeff

 

View solution in original post

18 REPLIES 18
LinusH
Tourmaline | Level 20
From DIS perspective your suggestion would probably work, just your exact need to result in a table rather than a view (otherwise you are trying to write to a tabel atbthe same time reading from it).
If it's a good idea depends on your constraints (disk, up, processing time).
Data never sleeps
May15
Obsidian | Level 7

Hi LinusH, 

 

Thanks for your reply.

 

I've tried but it doesn't work. I've got the following error:

 

ERROR: Attempt to append a view to itself.

 

However, it results in a table, not in a view. I don't know if I understood the restriction you've said.

 

 

Thanks,

 

May 15

LinusH
Tourmaline | Level 20
I don't know either. Show a view of your job an the code generated for the extract transformation.
Data never sleeps
May15
Obsidian | Level 7

Hi LinusH,

 

I've found the issue... The extract transformation had the option "Create as view" checked. I unchecked this and now it works.

 

By the way, using this approach, we don't lose any constraints and indexes of the data, right?

 

 

Thanks,

 

May15

JBailey
Barite | Level 11

Hi @May15

 

If you are talking about deleting observations from a SAS data set this might help. 

 

SAS uses lazy deletes. This means that the observation in the file is marked as deleted but it remains in the file. This is why the delete statement has no affect on the size of the SAS data set. If your file is really small you may not see a reduction in size, at all. The only way that I know of to reduce the size of the file is to make a copy of it using a SAS tool (PROC COPY, DATA step, PROC SQL, etc). 

 

For example

data small;
   x=1;
run;

This code results in a file that is 128K. Deleting this single row, then copying the SAS data set via a SAS tool, will not change the size of the file because it is so small. 

 

Using DI Studio you could:

  1. Create a new file containing the data you want (smaller than the original)
  2. Delete the original SAS data set (if you want to play it safe this file can be renamed instead of deleted)
  3. Rename the new data set to the old name

Compressed SAS datasets behave a little differently. Here's a SAS Note with more details: http://support.sas.com/kb/32/042.html

 

Good luck,

Jeff

 

May15
Obsidian | Level 7

Hi JBailey,

 

Thanks for your tips. In fact, it's a shame that the delete transformation doesn't save any space.

 

 

 

Thanks,

 

May15

JBailey
Barite | Level 11

Hi @May15

 

My pleasure. Hopefully it answers the question. 

 

The transformation only issues the DELETE code and it must work regardless of data source. The clearing of "slack" space in a SAS data set could be set up in a different process and run periodically. 

 

Best wishes,
Jeff

May15
Obsidian | Level 7

Hi @JBailey,

 

Let me ask, what yould you suggest to clear the "slack" space in a SAS data set?

 

 

Thanks,

 

May15

mkeintz
PROC Star

Why is it "a shame that the delete transformation doesn't save any space"?

 

Consider a dataset with 100M observations stored in,say, 5M physical data set disk pages (of 64K bytes each).  You delete the first observation, requiring the reading and re-writing of all 5M pages.  This is no different than a PROC SQL, DATA step, PROC COPY etc.  Logical deletion is the superior way to go in such a case.  Yes, at some point if enough obs have been marked as deleted, it might make send to copy to a smaller disk usage, but making that behavior automatic is not necessarily cost effective.

 

I suppose some data bases offer an immediate reduction in disk storage with every row eliminated, but I imagine to do so, those databases are all accessed via indexes.  I suspect they do not offer the rapid sequential read capabilities of SAS.

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
LinusH
Tourmaline | Level 20
That depends on the settings in the Table Loader.
But test the job and analyze results, as a normal development and unit test.
Data never sleeps
May15
Obsidian | Level 7

Sure LinusH.

 

Thanks,

 

May15

LinusH
Tourmaline | Level 20
You can have it defined with the REUSE data set option. Will save some space but eventually over time the internal storage will be scattered, so a total rewrite of the table now and then is recommended.
Data never sleeps
May15
Obsidian | Level 7

Hi all again,

 

Unfortunately, the idea I've posted in the original post isn't working. I'm getting the following errors:

 

 

WARNING: Target table is referenced by constraints in another table
WARNING: Replacing entire table will fail. Consider an alternate load technique or revising constraints.
ERROR: A rename/delete/replace attempt is not allowed for a data set involved in a referential integrity constraint.
 
 
In fact, this table has a primary key which is related to a foreign key in another table and also has a foreign key that is related to a primary key in another table.
Just to remember, in order to save space, the process is to extract only the data I want and then load it to the same table. The load process was done with the table loader transformatin using the "Replace" style and with the default constraint conditions:
 
Before load: On table creation
After load: Leave on
 
 
Any help to solve this issue??
 
 
 
Thanks,
 
May15
JBailey
Barite | Level 11

Extracting the rows and writing them back to the same table (really a SAS data set, I assume) won't help you because in order to make the file smaller it must be recreated using a SAS DATA step, PROC APPEND, etc. This removes the unused space in the file. If you recreate the file using COMPRESS=yes and REUSE=yes then space will be reused. But as Linus points out, it is a good idea to periodically recreate the file to clean the space up. 

 

The referential integrity can be dropped while maintenance is performed on the table. Once the work is done, the referential integrity can be reapplied. Keep in mind, if the data violates the constrains it will cause an error when reapplied.

 

I have been playing around with some code that shows how to make a file smaller. Hopefully, it will help somebody. I plan to take it and create a communities article on it at some point. In DI Studio you can use the APPEND transform to do a lot of this. I assume there is a way to rename a SAS data set; I don't know which transform will do that.

 

libname prod "C:\sasdata\prod";
libname stage "C:\sasdata\stage";

/* Create a 1GB SAS data set */
data prod.slackspace;
   do i = 1 to 10000000;
      x='abcdefghijklmnopqrstuvwxyz0123456789';
      y='abcdefghijklmnopqrstuvwxyz0123456789';
      z='abcdefghijklmnopqrstuvwxyz0123456789';
      output;
   end;
run;

/* DELETE all the observations from the 1GB SAS data set to see if it gets smaller */
proc sql;
   delete from prod.slackspace;
   select count(*) from prod.slackspace;
quit;

/* At this point the select count(*) returns 0 */
/* The file is still 1GB */ /* CREATE TABLE AS to see if the new SAS data set is smaller - it will be 128K */ proc sql; create table stage.noslack_sql as select * from prod.slackspace; run; /* Create new table using PROC APPEND to see if the new SAS data set is smaller */ proc append base=stage.noslack_append data=prod.slackspace; run; /* Create new table using a DATA step to see if the new SAS data set is smaller */ data stage.noslack_datastep; set prod.slackspace; run; /* Verify the sizes */ proc datasets lib=prod; run; proc datasets lib=stage; run; /* For production... */ /* Remove wasted space */ /* Use two libraries pointing to different devices for performance.*/
/* This prevents simultaneous reads and writes to the same device. */
/* Ignore the fact that I am on a laptop;)
proc append base=stage.slackspace_small data=prod.slackspace; run; /* move smaller file back into production - give it the original name */ proc datasets lib=prod; change slackspace=slackspace_old; copy in=stage out=prod move; change slackspace_small=slackspace; run;

Best wishes,

Jeff

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 18 replies
  • 4069 views
  • 3 likes
  • 4 in conversation