BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
k4minou
Obsidian | Level 7

Hello,

 

Is there any requirements when using some optimzation options like compress?

 

The server on where I am working is saturated and the work space is also on the same data disk partition.

 

And I am wondering if there is any options to do or not to do in this case to avoid I/O errors because of disk saturation.

 

What is the minimum disk size requirement to use the compress options? With the proc sort, I know that we can optimize with a subdata (filter during the sorting) and also with the option tagsort.

 

Thank you in advance for your replies and helps

 

 

Regards,

 

 

Michel

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

The "simple" SQL often needs to sort and/or create a utility file, which is also uncompressed. Depending on how your data looks (and what kind of relationships you have between tables in case of joins) and what you want to achieve, there might be more space-efficient ways.

 

WORK should have its own separate disks, as (in case of larger data) UTILLOC should also have. Per default, UTILLOC shares the WORK drives, but it should be kept separate.

 

A file system with 95% usage where work is done (and not just data kept in online storage) is just a crash waiting to happen.

 

Look at the compression rates of your datasets, and your dataset structure. If you have character variables defined to large sizes but being mostly empty, these will result in very high compression rates, and equally large disk consumption as soon as the data is uncompressed.

View solution in original post

6 REPLIES 6
Kurt_Bremser
Super User

The compress= option has no requirements. Just use it.

In some cases (eg very small datasets, or datasets with mostly numeric variables and no empty character variables) compression might increase file size; you can see this in the log, and rerun such a step without compression.

 

When sorting a heavily compressed dataset, the utility file will be much larger than the dataset; use the tagsort option in proc sort in such cases. 

k4minou
Obsidian | Level 7

Thank you for your reply,

 

I dont know what to do, or even what is the issue. I dont just understand it... I just only notice that I got I/O errors because of Disk Usage...

 

The disk storage has 5TB, 95% of the disk is used for data and on this disk there is also the workspace...

 

Currently, we have around 600GB free but it drops down suddenly to around 30GB (or even less) when I start to use proc sort etc (with the compress=yes as global options)

 

I will add the tagsort options to all my proc sort. But my current problem is that the I/O errors occurs with a simple proc sql...

 

The data I work on has around 2GB, so how could the 600GB disappears suddenly? I dont know why...

 

It's just crazy!

 

 

Thank you in any case for your reply 😃

 

 

Cheers

Kurt_Bremser
Super User

The "simple" SQL often needs to sort and/or create a utility file, which is also uncompressed. Depending on how your data looks (and what kind of relationships you have between tables in case of joins) and what you want to achieve, there might be more space-efficient ways.

 

WORK should have its own separate disks, as (in case of larger data) UTILLOC should also have. Per default, UTILLOC shares the WORK drives, but it should be kept separate.

 

A file system with 95% usage where work is done (and not just data kept in online storage) is just a crash waiting to happen.

 

Look at the compression rates of your datasets, and your dataset structure. If you have character variables defined to large sizes but being mostly empty, these will result in very high compression rates, and equally large disk consumption as soon as the data is uncompressed.

k4minou
Obsidian | Level 7

Thank you very much 😃

 

It's clear now.

LinusH
Tourmaline | Level 20

Also, be aware that compression drives CPU, so use it only when you'll gain lets say 50% or more in compression.

Ans also, form data sets with high ratio of numerical variables, try the compress=binary option.

Data never sleeps
k4minou
Obsidian | Level 7

he he thx 😃

 

yeah will try all the options I can thx to both of u

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 941 views
  • 1 like
  • 3 in conversation