Automating Data Load from Hive to CAS

Mahis · Posted 09-05-2024 10:39 AM

Hi everyone,

I'm working on automating the data loading process from Hive to CAS for daily SAS Viya reports. Any advice on how to do this?

Thanks.

Patrick · Posted 09-08-2024 09:36 PM

We would need quite a bit more information to provide more than some generic design principles as response.

If you say automation: Do you already have all the code in place and it's just about scheduling or is this about design and implementation of a reporting system in Viya?

Mahis · Posted 09-19-2024 05:16 AM

I currently use SAS Data Explorer to manually load data from Hive into CAS memory by selecting the table and clicking "load into memory"

This table is the data source for a report in SAS VA

New records are added to this table every day and I want to automate the process of loading these new records into CAS memory daily. I also need the SAS VA report to show the new records automatically without any manual steps.

Is there any feature in SAS Viya to do this or do I need to write code for it?

LinusH · Posted 09-19-2024 06:00 AM

There's isn't a ton of documentation, but I think relates to your requrement:

SAS Help Center: Working with Data in CAS

12. When ready, you can run the import operation. Do one of the following:

You can right-click the table or file to be copied and select Import item. If the import succeeds, a copy of the table or file is loaded to memory on the CAS server that is specified in the caslib. The copy of the table or file can be selected from the In-memory data (available) list or the Sources tab. If the import fails, see General Usage Notes.
The Import item option is used to run a job that imports the selected data. You can review the status of import jobs on the Monitoring tab of the Jobs page in SAS Environment Manager. For more information about using this tab, see Monitor Jobs in SAS Environment Manager: User’s Guide.
For more information about local file import, see https://developer.sas.com/apis/rest/Compute/#upload-a-data-file-to-the-collection-of-tables.
If you right-click the table or file in the import queue, and the Create a job option is available, you can use this option to create a job request on the Scheduling tab of the Jobs window in SAS Environment Manager. You can then use the Scheduling tab to run or schedule the job at an appropriate time. For more information, see Create Import Job Requests. After the import job request is run, a copy of the table or file is loaded to memory on the CAS server that is specified in the caslib. The copy can be selected from the In-memory data (available) list or the Sources tab.

Data never sleeps

Patrick · Posted 09-19-2024 06:29 AM

Given that CAS tables are in-memory that gets wiped out whenever the server needs to restart (like for maintenance) I'd go for a daily full-load.

Depending on how your environment looks like a load from Hadoop executes potentially fully in parallel and can be very fast.

I'd certainly go for a SAS coding option that's scheduled. If you must load new data during business hours then I'd also first load into another CAS table and then "swap" the tables in CAS to keep the outage for report consumers to a minimum.

One of the reasons that I'd go for code: It allows you to explicitly defined how to map source to target columns because for example Hive String data types will per default result in a 1024 Varchar CAS column. As a rule of thumb: Don't use CAS Varchar for columns where the max string length is below 16 bytes (use CHAR instead) and also use CHAR where the string length is constant.

Also: If the data is used for reports that mostly show detail data aggregated by some categorical variable (example: transactional data summed by account) then it's also beneficial for report performance to load all the data that belongs to the same value of this categorical variable (example: the same account) onto the same cas node. This is something you can do with code - like the loadTable action

Automating Data Load from Hive to CAS

Re: Automating Data Load from Hive to CAS

Re: Automating Data Load from Hive to CAS

Re: Automating Data Load from Hive to CAS

Re: Automating Data Load from Hive to CAS

Automating Data Load from Hive to CAS

Re: Automating Data Load from Hive to CAS

Re: Automating Data Load from Hive to CAS

Re: Automating Data Load from Hive to CAS

Re: Automating Data Load from Hive to CAS

The 2025 SAS Hackathon has begun!