Exploring, modeling, predicting and reporting with SAS Visual Analytics and SAS Visual Statistics

Best practices to control data Size in LASR

Reply
Contributor
Posts: 31

Best practices to control data Size in LASR

Hi,

We have a  compressed table in LASR server with the size of 11GB. Table has 160 million records and 33 variables. Around 25 of these are character variables.
Since dashboard consuming this table was giving slow performance, we uncompressed this table. Size of the uncompressed table rose to 680 GB from 11GB.

Compression ratio of the said table in LASR is 65%. With this ratio, size of the compressed data has been increased 61 times.

To improve the dashboard performance data needs to be stored in uncompressed form.  Are there any techniques/best practices that can be used to control the data size?

Thanks,

Nikhil

Super User
Posts: 1,209

Re: Best practices to control data Size in LASR

Hi Nikhil,

I would definitely look at reducing the length of your character columns. If you are getting 65% compression it could be due to having a lot of padding at the end of your columns. If you have a column that takes up a max of 8 characters and is defined as a length of 50 then you have 150 million records of ~40 characters wasted space (and memory in uncompressed format).

I'd suggest an analysis comparing the defined column length and the actual length used (see Length Function in SAS(R) 9.4 Functions and CALL Routines: Reference, Third Edition) using a data step. Then determine what you actually need rather than 'nice to have' as the defined length of the column

A SAS Global Forum 2015 paper that shares some tips and tricks with data prep you may want to look at is http://support.sas.com/resources/papers/proceedings15/SAS1905-2015.pdf

Please let us know how you go.

Kind Regards,

Michelle

Esteemed Advisor
Posts: 5,995

Re: Best practices to control data Size in LASR

My math understanding tells me a compression rate of 65% should lead to an increase of about 200% when compression is omitted. When data size increases by a factor of more than 60, the compression rate should be around 98.4%. This puzzles me somewhat.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Esteemed Advisor
Posts: 5,100

Re: Best practices to control data Size in LASR

Agree.

If you are certain about your figures, I would open track to SAS tech support.

This is not a Star Schema?

Data never sleeps
Contributor
Posts: 31

Re: Best practices to control data Size in LASR

Hi Linus,

Please find below compression info log of the LASR table for your reference.

The IMSTAT Procedure

                                                      Compression Information

                                             Data Source          HPS.INVENTORY_MON_DS
                                             Table                HPS.INVENTORY_MON_DS
                                             Size                 6.8e+02GB          
                                             Compressed Size      11GB               
                                             Compression Ratio    65 

Thanks,

Nikhil

Esteemed Advisor
Posts: 5,995

Re: Best practices to control data Size in LASR

OK, so it is not percent, but the factor.

Look at your character variables first. I guess you have some that are defined quite big and mostly empty, or have repeat values that are easily compressed.

You can also gain with resizing numeric vars. Dates don't need 8 bytes, for instance.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Esteemed Advisor
Posts: 5,100

Re: Best practices to control data Size in LASR

How do you know that it's the compression per se that slows down your dashboard? My guess would be that the "uncompression ratio" during execution slows down performance. Idon't know much abour LSAR compression, but a 98% in Base SAS would be considered sensational, and definitely signals that your data isn't modeled to it's purpose.

Data never sleeps
Contributor
Posts: 31

Re: Best practices to control data Size in LASR

Hi Linus,

In the  Reference guide on LASR server it is mentioned that "Compression exchanges less memory use for more CPU use. It slows down
any request that processes the data
". Having said that you are right in saying "uncompression ratio" during execution slows down performance, since it uncompressed every record while execution and with the compression ratio at hand its worth looking closely at the data modelling to reduce the size and hence improve the performance. Thanks a lot for your input.

SAS(R) LASR(TM) Analytic Server 2.4: Reference Guide

Super User
Posts: 1,209

Re: Best practices to control data Size in LASR

Hi Nikhil,

You might also find this discussion thread useful

Kind Regards,

Michelle

Post a Question
Discussion Stats
  • 8 replies
  • 976 views
  • 1 like
  • 4 in conversation