Hi,
We have a compressed table in LASR server with the size of 11GB. Table has 160 million records and 33 variables. Around 25 of these are character variables.
Since dashboard consuming this table was giving slow performance, we uncompressed this table. Size of the uncompressed table rose to 680 GB from 11GB.
Compression ratio of the said table in LASR is 65%. With this ratio, size of the compressed data has been increased 61 times.
To improve the dashboard performance data needs to be stored in uncompressed form. Are there any techniques/best practices that can be used to control the data size?
Thanks,
Nikhil
Hi Nikhil,
I would definitely look at reducing the length of your character columns. If you are getting 65% compression it could be due to having a lot of padding at the end of your columns. If you have a column that takes up a max of 8 characters and is defined as a length of 50 then you have 150 million records of ~40 characters wasted space (and memory in uncompressed format).
I'd suggest an analysis comparing the defined column length and the actual length used (see Length Function in SAS(R) 9.4 Functions and CALL Routines: Reference, Third Edition) using a data step. Then determine what you actually need rather than 'nice to have' as the defined length of the column
A SAS Global Forum 2015 paper that shares some tips and tricks with data prep you may want to look at is http://support.sas.com/resources/papers/proceedings15/SAS1905-2015.pdf
Please let us know how you go.
Kind Regards,
Michelle
My math understanding tells me a compression rate of 65% should lead to an increase of about 200% when compression is omitted. When data size increases by a factor of more than 60, the compression rate should be around 98.4%. This puzzles me somewhat.
Agree.
If you are certain about your figures, I would open track to SAS tech support.
This is not a Star Schema?
Hi Linus,
Please find below compression info log of the LASR table for your reference.
The IMSTAT Procedure
Compression Information
Data Source HPS.INVENTORY_MON_DS
Table HPS.INVENTORY_MON_DS
Size 6.8e+02GB
Compressed Size 11GB
Compression Ratio 65
Thanks,
Nikhil
OK, so it is not percent, but the factor.
Look at your character variables first. I guess you have some that are defined quite big and mostly empty, or have repeat values that are easily compressed.
You can also gain with resizing numeric vars. Dates don't need 8 bytes, for instance.
How do you know that it's the compression per se that slows down your dashboard? My guess would be that the "uncompression ratio" during execution slows down performance. Idon't know much abour LSAR compression, but a 98% in Base SAS would be considered sensational, and definitely signals that your data isn't modeled to it's purpose.
Hi Linus,
In the Reference guide on LASR server it is mentioned that "Compression exchanges less memory use for more CPU use. It slows down
any request that processes the data". Having said that you are right in saying "uncompression ratio" during execution slows down performance, since it uncompressed every record while execution and with the compression ratio at hand its worth looking closely at the data modelling to reduce the size and hence improve the performance. Thanks a lot for your input.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
See how to use one filter for multiple data sources by mapping your data from SAS’ Alexandria McCall.
Find more tutorials on the SAS Users YouTube channel.