11-02-2016 02:41 AM
I had a 121 GB Dataset in which i had to change the length of a variable from char(20) to char(50). I used the below code to do this and found that the resulting dataset was only 23 GB. However when i did the proc contents i saw both records had the same number of observation and the only difference in the output of proc contents was that in the dataset before updating "Deleted Observation" value was 69956558 and after updating "Deleted Observation" value was 0. Could you please let me know what difference does this make and is this the reason for size change.
Code for updating variable size:
alter table mart.s2_pseudo_qrt_table
modify PRODUCT_LINE_ID char(50) ;
11-02-2016 02:50 AM
It looks like your dataset originally contained nearly 70M observations that were marked for deletion but still taking up space. When you did your ALTER TABLE the entire dataset was processed, removing the observations marked for deletion and the space they were taking up. This explains the drop in dataset size.
11-02-2016 03:42 AM
11-02-2016 04:03 AM
This can be achieved by (eg) a data step with modify and remove.
11-03-2016 04:35 AM
Can you please give an example of the same.
See this very simple example:
data class; set sashelp.class; run; data class; modify class; if age = 15 then remove; run; proc contents data=class; run;
The CONTENTS Procedure Data Set Name WORK.CLASS Observations 15 Member Type DATA Variables 5 Engine V9 Indexes 0 Created Do, 03. Nov 2016 09.24 Uhr Observation Length 40 Last Modified Do, 03. Nov 2016 09.24 Uhr Deleted Observations 4 Protection Compressed NO Data Set Type Sorted NO Label
You can see that the original 19 obs of sashelp.class have been reduced to 15, but you also see 4 deleted observations that are still physically in there and consume space.
Now let's run a "simple" data step over that:
data class; set class; run; proc contents data=class; run;
The CONTENTS Procedure Data Set Name WORK.CLASS Observations 15 Member Type DATA Variables 5 Engine V9 Indexes 0 Created Do, 03. Nov 2016 09.32 Uhr Observation Length 40 Last Modified Do, 03. Nov 2016 09.32 Uhr Deleted Observations 0 Protection Compressed NO Data Set Type Sorted NO
Now those 4 deleted observations have vanished, and SAS can reduce the physical size of the dataset.
Of course, given the small size of SASHELP.CLASS, this will not be visible on the OS level, as only 1 dataset page is needed anyway. But you get the picture.