Hello,
i have a dataset with 16.000.000 record and 136 column. I load this dataset every day on cas. My procedure is this:
data hseq.HSEQ_INSP_FIND_TEMP1;
set staging.HSEQ_INSP_FIND_TEMP
;
run;
proc casutil incaslib="hseq" outcaslib="hseq";
droptable casdata="HSEQ_INSP_FIND_TEMP" quiet;
promote casdata="HSEQ_INSP_FIND_TEMP1" casout="HSEQ_INSP_FIND_TEMP";
save casdata="HSEQ_INSP_FIND_TEMP" casout="HSEQ_INSP_FIND_TEMP" replace;
quit;
but is very slowly about over two hours.
is there another way for load dataset in cas?
Thank you,
A.
I have seen this method load faster. You could try it instead of the DATA step.
%let path=%sysfunc(pathname(staging));
proc cas;
table.upload / path="&path/HSEQ_INSP_FIND_TEMP.sas7bdat" casout={caslib="hseq" name="HSEQ_INSP_FIND_TEMP1"};
quit;
Out of the 2 hours, how much of that time is the actual load step?
And are all of the 136 columns needed? Can you drop any?
unfortunately my dataset is on sas 9.4 and i must load they on the cas, because i read this data from sap/bw. the sas job start all day in different hours after finish the job sap/bw and the time for load this dataset to very large and i can't major performance
This post have some suggestions how to trigger a parallel load from SAS datasets:
If your process is critical, and you have an MPP CAS, you can always explore these options:
Just as a comment to the first link @LinusH shared to an article from 2019: I made the experience of negative query performance impact when compressing a CAS table (using compress=true).
If you're on a very recent Viya version then use memoryFormat="DVR" instead. https://communities.sas.com/t5/SAS-Communities-Library/Viya-2020-1-CAS-Duplicate-Value-Reduction/ta-...
I strongly suspect that the poor performance is caused by large default character column lengths being defined on the SAS dataset being read from SAP/BW. This results in the dataset size being much larger than it should be. In SAS 9.4 data libraries on disk this is not a problem as you can apply the SAS option COMPRESS to get rid of the extra space efficiently. It is advisable to not compress in-memory tables as it adversely affects performance. I know it is tedious but you can resize all long character columns using the LENGTH statement to reduce space in the on-disk dataset which should improve load times into memory a lot.
I have seen this method load faster. You could try it instead of the DATA step.
%let path=%sysfunc(pathname(staging));
proc cas;
table.upload / path="&path/HSEQ_INSP_FIND_TEMP.sas7bdat" casout={caslib="hseq" name="HSEQ_INSP_FIND_TEMP1"};
quit;
Out of the 2 hours, how much of that time is the actual load step?
And are all of the 136 columns needed? Can you drop any?
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.