We have a compressed table in LASR server with the size of 11GB. Table has 160 million records and 33 variables. Around 25 of these are character variables.
Since dashboard consuming this table was giving slow performance, we uncompressed this table. Size of the uncompressed table rose to 680 GB from 11GB.
Compression ratio of the said table in LASR is 65%. With this ratio, size of the compressed data has been increased 61 times.
To improve the dashboard performance data needs to be stored in uncompressed form. Are there any techniques/best practices that can be used to control the data size?
I would definitely look at reducing the length of your character columns. If you are getting 65% compression it could be due to having a lot of padding at the end of your columns. If you have a column that takes up a max of 8 characters and is defined as a length of 50 then you have 150 million records of ~40 characters wasted space (and memory in uncompressed format).
I'd suggest an analysis comparing the defined column length and the actual length used (see Length Function in SAS(R) 9.4 Functions and CALL Routines: Reference, Third Edition) using a data step. Then determine what you actually need rather than 'nice to have' as the defined length of the column
A SAS Global Forum 2015 paper that shares some tips and tricks with data prep you may want to look at is http://support.sas.com/resources/papers/proceedings15/SAS1905-2015.pdf
Please let us know how you go.
My math understanding tells me a compression rate of 65% should lead to an increase of about 200% when compression is omitted. When data size increases by a factor of more than 60, the compression rate should be around 98.4%. This puzzles me somewhat.
Please find below compression info log of the LASR table for your reference.
The IMSTAT Procedure
Data Source HPS.INVENTORY_MON_DS
Compressed Size 11GB
Compression Ratio 65
OK, so it is not percent, but the factor.
Look at your character variables first. I guess you have some that are defined quite big and mostly empty, or have repeat values that are easily compressed.
You can also gain with resizing numeric vars. Dates don't need 8 bytes, for instance.
How do you know that it's the compression per se that slows down your dashboard? My guess would be that the "uncompression ratio" during execution slows down performance. Idon't know much abour LSAR compression, but a 98% in Base SAS would be considered sensational, and definitely signals that your data isn't modeled to it's purpose.
In the Reference guide on LASR server it is mentioned that "Compression exchanges less memory use for more CPU use. It slows down
any request that processes the data". Having said that you are right in saying "uncompression ratio" during execution slows down performance, since it uncompressed every record while execution and with the compression ratio at hand its worth looking closely at the data modelling to reduce the size and hence improve the performance. Thanks a lot for your input.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
See how to use one filter for multiple data sources by mapping your data from SAS’ Alexandria McCall.
Find more tutorials on the SAS Users YouTube channel.