02-23-2012 12:54 PM
I have some fairly large data sets that are created on a monthly basis. I was wondering if you have any tips on data compression techniqes to speed up the time SAS takes to create these data sets?
Thanks for the tips!
02-23-2012 01:21 PM
You're asking two different questions. Compression affects disk space usage. As a general rule, this does not speed up programs but rather slows them down slightly. What are you trying to accomplish?
02-23-2012 06:23 PM
Like others have said, seeing what your data and data needs are, as well as environment, would have to be known in order for anyone to suggest valid possibilities.
Tom mentioned summarizing as early as possible. I agree, but would even back up a step before that. Only import and/or keep the data that are really necessary.
02-23-2012 01:31 PM
Since the slower computing operation is typically I/O compression often does speed up programs by allowing the disk to push the data faster at the cost of increased CPU usage. It all depends where your bottlenecking in performance.
Please provide additional information about how these files are created currently on a monthly basis and any system information you know or feel comfortable sharing, especially OS, available storage types, cpu counts, ram, etc..
What is the nature of the data, does it have natural segmentation for the analysis you intend to do with it? Why do you feel this program is running slow? What is it's current performance metrics and where would you want it to be?
Increasing performance is a very large question to ask so vaguely.
02-23-2012 07:28 PM
In My experience compressing has actually speed up the process, it really depends on where the bottle neck is... Ie if you have slow disk, writing/reading as little as possible helps...
02-24-2012 03:42 AM
If we look at compression as such, if your data is mainly character, try the ordinary CHAR method.
If you have lots of numerical data, try BINARY.
If you have lots of numerical integers, you can try to specify other length than the default 8.
As Barry says, it deponds on your system and data, if compression will help processing time. The penalty for compression is more CPU cycles.
My experience is that you need at least 50% compression to gain shorter processing time.
Hoe long is your run time anyway? Do you have indexes defined?
Depending on the requirements, some re-modelling of the data can sometimes help.
02-24-2012 01:53 PM
If I don't want to babysit the program while running I'd be tempted to schedule it to run while I'm out of the office, say overnight if possible. Then everything is done when I come in the morning.