BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Jagadeesh2907
Obsidian | Level 7

Hi ,

 

I had a 121 GB Dataset in which i had to change the length of a variable from char(20) to char(50). I used the below code to do this and found that the resulting dataset was only 23 GB. However when i did the proc contents i saw both records had the same number of observation and the only difference in the output of proc contents was that in the dataset before updating "Deleted Observation" value was 69956558 and after updating "Deleted Observation" value was 0. Could you please let me know what difference does this make and is this the reason for size change.

 

Code for updating variable size:

proc sql;
alter table mart.s2_pseudo_qrt_table
modify PRODUCT_LINE_ID char(50) ;
quit;

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

@Jagadeesh2907 wrote:

Can you please give an example of the same.


??

 

See this very simple example:

data class;
set sashelp.class;
run;

data class;
modify class;
if age = 15 then remove;
run;

proc contents data=class;
run;

Partial result:

                                    The CONTENTS Procedure

Data Set Name        WORK.CLASS                                       Observations          15
Member Type          DATA                                             Variables             5 
Engine               V9                                               Indexes               0 
Created              Do, 03. Nov 2016 09.24 Uhr                       Observation Length    40
Last Modified        Do, 03. Nov 2016 09.24 Uhr                       Deleted Observations  4 
Protection                                                            Compressed            NO
Data Set Type                                                         Sorted                NO
Label                                                                                         

You can see that the original 19 obs of sashelp.class have been reduced to 15, but you also see 4 deleted observations that are still physically in there and consume space.

Now let's run a "simple" data step over that:

data class;
set class;
run;

proc contents data=class;
run;

New result:

                                     The CONTENTS Procedure

 Data Set Name        WORK.CLASS                                       Observations          15
 Member Type          DATA                                             Variables             5 
 Engine               V9                                               Indexes               0 
 Created              Do, 03. Nov 2016 09.32 Uhr                       Observation Length    40
 Last Modified        Do, 03. Nov 2016 09.32 Uhr                       Deleted Observations  0 
 Protection                                                            Compressed            NO
 Data Set Type                                                         Sorted                NO

Now those 4 deleted observations have vanished, and SAS can reduce the physical size of the dataset.

 

Of course, given the small size of SASHELP.CLASS, this will not be visible on the OS level, as only 1 dataset page is needed anyway. But you get the picture.

View solution in original post

6 REPLIES 6
SASKiwi
PROC Star

It looks like your dataset originally contained nearly 70M observations that were marked for deletion but still taking up space. When you did your ALTER TABLE the entire dataset was processed, removing the observations marked for deletion and the space they were taking up. This explains the drop in dataset size.

Jagadeesh2907
Obsidian | Level 7
Hi , Thank you for the explanation. This gives a fair idea. However, when you say records were marked for deletion but were not actually deleted ?
Jagadeesh2907
Obsidian | Level 7

Can you please give an example of the same.

Kurt_Bremser
Super User

@Jagadeesh2907 wrote:

Can you please give an example of the same.


??

 

See this very simple example:

data class;
set sashelp.class;
run;

data class;
modify class;
if age = 15 then remove;
run;

proc contents data=class;
run;

Partial result:

                                    The CONTENTS Procedure

Data Set Name        WORK.CLASS                                       Observations          15
Member Type          DATA                                             Variables             5 
Engine               V9                                               Indexes               0 
Created              Do, 03. Nov 2016 09.24 Uhr                       Observation Length    40
Last Modified        Do, 03. Nov 2016 09.24 Uhr                       Deleted Observations  4 
Protection                                                            Compressed            NO
Data Set Type                                                         Sorted                NO
Label                                                                                         

You can see that the original 19 obs of sashelp.class have been reduced to 15, but you also see 4 deleted observations that are still physically in there and consume space.

Now let's run a "simple" data step over that:

data class;
set class;
run;

proc contents data=class;
run;

New result:

                                     The CONTENTS Procedure

 Data Set Name        WORK.CLASS                                       Observations          15
 Member Type          DATA                                             Variables             5 
 Engine               V9                                               Indexes               0 
 Created              Do, 03. Nov 2016 09.32 Uhr                       Observation Length    40
 Last Modified        Do, 03. Nov 2016 09.32 Uhr                       Deleted Observations  0 
 Protection                                                            Compressed            NO
 Data Set Type                                                         Sorted                NO

Now those 4 deleted observations have vanished, and SAS can reduce the physical size of the dataset.

 

Of course, given the small size of SASHELP.CLASS, this will not be visible on the OS level, as only 1 dataset page is needed anyway. But you get the picture.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1366 views
  • 2 likes
  • 3 in conversation