Solved: How to handle large time series data when it does not fit in memory?

alisio_meneses · Posted 02-20-2023 10:56 AM

Hello there!

I am using SAS Viya 3.5, specifically a Model Studio Forecasting project to generate approximately two thousand forecasts for a dataset that has three years of data for a single dependent variable and four BY variables. The project's pipeline contains one Auto-forecasting node, one hierarchical forecasting node, and it all worked fine for a while until the number of unique values for two of the four BY variables doubled. Since then, the hierarchical forecasting node stops with a bunch of warnings and errors in the log while the Auto-Forecasting node still runs fine (screenshot below).

The following errors are present in the log file for the hierarchical forecasting node:

WARNING: Communication failure among server nodes. Journaling communicator repaired.
WARNING: Communication with machine yadayadayada.yada1 has been lost.
WARNING: Communication with machine yadayadayada.yada2 has been lost.
WARNING: Communication with machine yadayadayada.yada3 has been lost.
ERROR: The action cannot be retried because the session has no available workers.
ERROR: The operation was not performed because contact with at least one node was lost before the operation could complete.
ERROR: The action stopped due to errors.

I am not sure, but it seems the warnings and errors are related to resource exhaustion. Regardless, it made me wonder: how to handle very large time-series dataset using SAS Viya when it does not fit in memory?

Additional environment info: SAS Viya 3.5 using MPP Architecture with one CAS controller and three CAS workers.

JosvanderVelden · Posted 02-20-2023 01:37 PM

SAS Cloud Analytic Services organizes data from tables in blocks. With the exception of SASHDAT files and specialized cases, a copy of in-memory blocks are temporarily stored in file system directories. When the server is installed, one or more of these directories are specified for the CAS_DISK_CACHE environment variable.

To read more: https://communities.sas.com/t5/SAS-Communities-Library/Provisioning-CAS-DISK-CACHE-for-SAS-Viya/ta-p... & https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/casfun/n16qbskv0hwfq1n1lnrqq605p6sv.htm

View solution in original post

JosvanderVelden · Posted 02-20-2023 01:37 PM

SAS Cloud Analytic Services organizes data from tables in blocks. With the exception of SASHDAT files and specialized cases, a copy of in-memory blocks are temporarily stored in file system directories. When the server is installed, one or more of these directories are specified for the CAS_DISK_CACHE environment variable.

To read more: https://communities.sas.com/t5/SAS-Communities-Library/Provisioning-CAS-DISK-CACHE-for-SAS-Viya/ta-p... & https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/casfun/n16qbskv0hwfq1n1lnrqq605p6sv.htm

alisio_meneses · Posted 02-20-2023 06:09 PM

I see. Thanks for your info and links. I'll have to check if the CAS_DISK_CACHE is set.

alisio_meneses · Posted 02-24-2023 05:11 AM

After correctly setting CAS_DISK_CACHE, the errors went away. The side effect is a slow pipeline execution. I mean, really slow. But then again, as one of the provided links you states: '....if CAS is relying on persistent storage for its cache, then expect a commensurate slowdown in performance since data loading from physical disk is much slower than from RAM.'

Thank you for your help, @JosvanderVelden .

How to handle large time series data when it does not fit in memory?

Re: How to handle large time series data when it does not fit in memory?

Re: How to handle large time series data when it does not fit in memory?

Re: How to handle large time series data when it does not fit in memory?

Re: How to handle large time series data when it does not fit in memory?

How to handle large time series data when it does not fit in memory?

Re: How to handle large time series data when it does not fit in memory?

Re: How to handle large time series data when it does not fit in memory?

Re: How to handle large time series data when it does not fit in memory?

Re: How to handle large time series data when it does not fit in memory?

Registration is open