Hi everyone,
I'm working on automating the data loading process from Hive to CAS for daily SAS Viya reports. Any advice on how to do this?
Thanks.
We would need quite a bit more information to provide more than some generic design principles as response.
If you say automation: Do you already have all the code in place and it's just about scheduling or is this about design and implementation of a reporting system in Viya?
I currently use SAS Data Explorer to manually load data from Hive into CAS memory by selecting the table and clicking "load into memory"
This table is the data source for a report in SAS VA
New records are added to this table every day and I want to automate the process of loading these new records into CAS memory daily. I also need the SAS VA report to show the new records automatically without any manual steps.
Is there any feature in SAS Viya to do this or do I need to write code for it?
There's isn't a ton of documentation, but I think relates to your requrement:
SAS Help Center: Working with Data in CAS
12. When ready, you can run the import operation. Do one of the following:
The Import item option is used to run a job that imports the selected data. You can review the status of import jobs on the Monitoring tab of the Jobs page in SAS Environment Manager. For more information about using this tab, see Monitor Jobs in SAS Environment Manager: User’s Guide.
Given that CAS tables are in-memory that gets wiped out whenever the server needs to restart (like for maintenance) I'd go for a daily full-load.
Depending on how your environment looks like a load from Hadoop executes potentially fully in parallel and can be very fast.
I'd certainly go for a SAS coding option that's scheduled. If you must load new data during business hours then I'd also first load into another CAS table and then "swap" the tables in CAS to keep the outage for report consumers to a minimum.
One of the reasons that I'd go for code: It allows you to explicitly defined how to map source to target columns because for example Hive String data types will per default result in a 1024 Varchar CAS column. As a rule of thumb: Don't use CAS Varchar for columns where the max string length is below 16 bytes (use CHAR instead) and also use CHAR where the string length is constant.
Also: If the data is used for reports that mostly show detail data aggregated by some categorical variable (example: transactional data summed by account) then it's also beneficial for report performance to load all the data that belongs to the same value of this categorical variable (example: the same account) onto the same cas node. This is something you can do with code - like the loadTable action
Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.