Hi,
I have an executive report on SAS VA and have to load necessary tables into CAS from a SQL server source every hour. I basically do this:
proc sql; create table CASUSER.table1 as select ... /* new columns are generated */ from sqlsource1 left join sqlsource2 on ... ; quit; proc datasets library =PUBLIC memtype = (data view) nolist nowarn; delete table1; quit; proc casutil incaslib= 'mycasuser' outcaslib= 'public'; promote casdata= 'table1'; run;
This operation happens every hour at 30 past. When a user attempts to open the report at 30 past, since the reloading table is not finished yet, the newly created columns can not be found in the report at that time. This yields to the report being in an erroneous state. How to avoid this situation?
Thanks for reading.
@fda2 Good that you found a working approach.
You are loading into a CAS session scope table. Report consumers are only impacted during the time when you drop the global scope table and promote the session scope table to global scope. This process should be rather quick and you can't improve it further.
I believe though there are some opportunities to improve on end-to-end performance and robustness.
Current state
Improvement ideas
Further considerations for report performance and CAS memory consumption
since the reloading table is not finished yet
From how I understand your code you're loading the table first and the outage for your report consumers should be very short (drop/promote). Not sure if doing this within a single Proc Casutil as discussed here would make a significant difference.
Another possibility (can't test. Don't have Viya access anymore😥) could be just-in-time loading. Instead of refreshing the in-memory table you just refresh it's physical leg and then drop the table. The next user accessing the report will then trigger the load. No idea though what the impact is for report consumers that were already working with your report before the refresh.
And as another idea:
You could have a load_dttm column in your CAS table and your report filters rows for max(load_dttm). You then append data to the existing CAS table as step 1 and then as step 2 delete the old rows.
And depending on your data you could also consider if there is a way that you can do a insert/update/delete instead of a full refresh ...but this would likely cause "wrong" data for the time where the delete already executed but the insert didn't finish yet.
Hi @Patrick, so much thanks for the details. Using single proc casutil actually didn't affect too much in my case. I realized that the create table statement takes a lot of time. So I separated the extraction phase and loading into CAS phase. Since uploading to CAS is fast, downtime is reduced so much that the user does not notice it.
Updating rows instead of full refresh might be considered. But in my case, the table structure doesn't allow me to perform this.
I will keep in mind the just-in-time loading for further studies!
Thank you.
@fda2 Good that you found a working approach.
You are loading into a CAS session scope table. Report consumers are only impacted during the time when you drop the global scope table and promote the session scope table to global scope. This process should be rather quick and you can't improve it further.
I believe though there are some opportunities to improve on end-to-end performance and robustness.
Current state
Improvement ideas
Further considerations for report performance and CAS memory consumption
@fda2 To add to below here two more articles worth reading.
Seriously Serial or Perfectly Parallel Data Transfer with CAS
Seriously Serial or Perfectly Parallel Data Transfer with SAS® Viya®
Thanks again @Patrick. I want to clarify some steps:
You offer to use fedsql(which I don't know now) and loading into PUBLIC instead of CASUSER caslib, right?
proc sql; create table CASUSER.table1 as select ... /* new columns are generated */ from sqlsource1 left join sqlsource2 on ... ; quit;
And for robustness, you suggest to save the global scope CAS table to it's physical permanent location. I couldn't figure out this step exactly. Assuming I applied your suggestions and my table will be on global scope in PUBLIC caslib eventually. Does promote action loads to table to the memory and the save operation saves the table to the disk?
Thanks in advance.
Hi @Patrick , I am confused at another point. I tried to wrote the code according to your guideline, just please ignore the fedsql part.
proc sql; CREATE TABLE PUBLIC.table1 AS SELECT ... FROM from sqlsource1 ... quit; proc casutil INCASLIB = 'public'; DROPTABLE CASDATA = 'table1 '; quit; proc casutil INCASLIB= 'public' OUTCASLIB = 'public'; PROMOTE CASDATA = 'table1 '; run; proc casutil; SAVE INCASLIB="public" CASDATA = "table1 " REPLACE; quit;
In this way, I create and drop the table. Then in promote step CAS couldn't found a table named table1, no wonder. Which part is wrong?
@fda2 As already said I don't have access to Viya anymore and though everything below untested and based on my understanding how things work.
CAS tables are volatile in-memory tables. If the CAS servers restart they get wiped out.
CAS Libs always define two "legs": The in-memory space and a permanent space (path= or datasrc=).
If you run below statement then you should get info in the log how it's defined and where the permanent space is.
caslib public list;
I believe for the out-of-the-box caslib public the permanent data is path= and a sashdat file. sashdat can only get created when writing a CAS in-memory table to disk.
For other formats like a database table your process could also be to first update the physical table as defined in the caslib, load this table into CAS and promote it.
When you want to automate the load into memory after a server startup (like just-in-time loading) then the physical source data must be in the permanent location as defined in the caslib. That's what in below code step 4 does and what's required to support such functionality.
Below how I believe your process could look like. I've changed my mind and switched back to using casuser for the local scope table and public for the global scope table because I felt it's easier to understand (and though maintain).
cas mysession;
/* caslibs casuser and public must exist under mysession */
/* 1. create and load CAS in-memory table casuser.table1 */
proc fedsql sessref=mysession;
create table casuser.table1{options replace=true} as
select ...
;
quit;
/* 2. drop global scope table public.table1 */
proc casutil sessref=mysession;
droptable casdata="table1" incaslib="public" quiet;
quit;
/* 3. promote local scope table casuser.table1 to global scope table public.table1 */
proc casutil sessref=mysession;
promote casdata="table1" incaslib="casuser" outcaslib="public" casout="table1";
quit;
/* 4. save data of in-memory global scope table public.table1 to its permanent location (=keep permanent data in synch) */
proc casutil sessref=mysession;
save casdata="table1" incaslib="public" replace;
quit;
/* 5. drop local scope table casuser.table1 */
proc casutil sessref=mysession;
droptable casdata="table1" incaslib="casuser" quiet;
quit;
I felt for development and debugging purposes it's better to use separate Procs per statement but below syntax option should also work.
proc casutil sessref=mysession;
/* 2. drop global scope table public.table1 */
droptable casdata="table1" incaslib="public" quiet;
/* 3. promote local scope table casuser.table1 to global scope table public.table1 */
promote casdata="table1" incaslib="casuser" outcaslib="public" casout="table1";
/* 4. save data of in-memory global scope table public.table1 to its permanent location (=keep permanent data in synch) */
save casdata="table1" incaslib="public" replace;
/* 5. drop local scope table casuser.table1 */
droptable casdata="table1" incaslib="casuser" quiet;
quit;
And an additional thought:
You're currently using the ootb caslib public which is accessible to all SAS users. If you envisage that now or in the future not all but only a group of SAS users should have such access to the data then I'd recommend to already now define and use a different caslib (defined via Environment Manager). You then can apply permissions to this caslib to control who gets access to the data (according to your security model).
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.