DATA Step, Macro, Functions and more

Difference in dataset size after updating the metadata

Reply
Occasional Contributor
Posts: 9

Difference in dataset size after updating the metadata

Hi ,

 

I had a 121 GB Dataset in which i had to change the length of a variable from char(20) to char(50). I used the below code to do this and found that the resulting dataset was only 23 GB. However when i did the proc contents i saw both records had the same number of observation and the only difference in the output of proc contents was that in the dataset before updating "Deleted Observation" value was 69956558 and after updating "Deleted Observation" value was 0. Could you please let me know what difference does this make and is this the reason for size change.

 

Code for updating variable size:

proc sql;
alter table mart.s2_pseudo_qrt_table
modify PRODUCT_LINE_ID char(50) ;
quit;

Super User
Posts: 3,260

Re: Difference in dataset size after updating the metadata

Posted in reply to Jagadeesh2907

It looks like your dataset originally contained nearly 70M observations that were marked for deletion but still taking up space. When you did your ALTER TABLE the entire dataset was processed, removing the observations marked for deletion and the space they were taking up. This explains the drop in dataset size.

Occasional Contributor
Posts: 9

Re: Difference in dataset size after updating the metadata

Hi , Thank you for the explanation. This gives a fair idea. However, when you say records were marked for deletion but were not actually deleted ?
Super User
Posts: 7,854

Re: Difference in dataset size after updating the metadata

Posted in reply to Jagadeesh2907

This can be achieved by (eg) a data step with modify and remove.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Occasional Contributor
Posts: 9

Re: Difference in dataset size after updating the metadata

Posted in reply to KurtBremser

Can you please give an example of the same.

Super User
Posts: 7,854

Re: Difference in dataset size after updating the metadata

Posted in reply to Jagadeesh2907

Jagadeesh2907 wrote:

Can you please give an example of the same.


??

 

See this very simple example:

data class;
set sashelp.class;
run;

data class;
modify class;
if age = 15 then remove;
run;

proc contents data=class;
run;

Partial result:

                                    The CONTENTS Procedure

Data Set Name        WORK.CLASS                                       Observations          15
Member Type          DATA                                             Variables             5 
Engine               V9                                               Indexes               0 
Created              Do, 03. Nov 2016 09.24 Uhr                       Observation Length    40
Last Modified        Do, 03. Nov 2016 09.24 Uhr                       Deleted Observations  4 
Protection                                                            Compressed            NO
Data Set Type                                                         Sorted                NO
Label                                                                                         

You can see that the original 19 obs of sashelp.class have been reduced to 15, but you also see 4 deleted observations that are still physically in there and consume space.

Now let's run a "simple" data step over that:

data class;
set class;
run;

proc contents data=class;
run;

New result:

                                     The CONTENTS Procedure

 Data Set Name        WORK.CLASS                                       Observations          15
 Member Type          DATA                                             Variables             5 
 Engine               V9                                               Indexes               0 
 Created              Do, 03. Nov 2016 09.32 Uhr                       Observation Length    40
 Last Modified        Do, 03. Nov 2016 09.32 Uhr                       Deleted Observations  0 
 Protection                                                            Compressed            NO
 Data Set Type                                                         Sorted                NO

Now those 4 deleted observations have vanished, and SAS can reduce the physical size of the dataset.

 

Of course, given the small size of SASHELP.CLASS, this will not be visible on the OS level, as only 1 dataset page is needed anyway. But you get the picture.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Ask a Question
Discussion stats
  • 5 replies
  • 243 views
  • 2 likes
  • 3 in conversation