BookmarkSubscribeRSS Feed
Kastchei
Pyrite | Level 9
Hello!

I have a 5 GB dataset with an index that's about 500 MB. I'm doing some fairly minor data step modifications and saving out as a new dataset with the same index. No rows are added or dropped, but a few variables are dropped.

The creation of this second dataset is eating through 85 GB of disk space to create the index. That's 170x the size of the index once created. The input dataset took nearly no disk space to create basically the same index.

What's more perplexing is that I have no clue where this is happening. I look at the Work library during this, and there's nothing remotely that huge accumulating. I even searched me entire HD for gigantic files (> 4 GB) and there's nothing that huge on the entire disk.

What's going on and how can I stop SAS from eating up so much disk space?
3 REPLIES 3
SASKiwi
PROC Star

Is your 5GB dataset compressed by any chance? Any time a compressed dataset is recreated typically it has to be decompressed and that includes any associated indexes. The SASUTIL folder is typically where this happens and depending on the compression ratio the utility files can blow out several times larger than the original dataset, but are deleted once the recreation process is complete.

Kastchei
Pyrite | Level 9
Howdy!

Yes, my dataset is compressed, decreasing the size by about 98%. So maybe that's the problem. However, my utility file had nothing in it 🤔 but maybe I was looking at an old utility folder. I do see it now.

However, here's another interesting clue. I created my second dataset without any indexes. Creating the index with proc datasets or SQL create index both worked in about 1.5 minutes. When I do another data step just to add the index (e.g. data a (index ...); set a; run;), it also produces the index just fine, just taking a bit longer to do so, but not eating away any excess disk space. It seems to be only when I create the index at the same time of making the dataset that it starts using tons of disk space and never finishes.

Of course, my solution is just to make the index afterwards, but I am curious why that's happening.
LinusH
Tourmaline | Level 20

The general recommendation when doing updates/appends to a table with indexes, drop the index and recreate after the update/insert operation.

I think the reason is that SAS needs to update the index record by record during an update, whereas a create index operation can do a single sort through operation. I think this applies to your scenario as well.

 

Data never sleeps

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 151 views
  • 0 likes
  • 3 in conversation