BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
acordes
Rhodochrosite | Level 12

I need to cut down execution time for large tables. 

Both technique I usually choose for adding a calculated column (as simple as new_column=price*1.21;), data step and datastep.runcode, take a long time for tables in cas with over 1.000.000 rows. 

The altertable action in my understanding can only rename, format and drop columns.

The attached code executes quickly but do not persist the calculated columns in the cas table. 

 

Any ideas?

 

%let outcas2=%qscan(&_input1, 1);
%let casy2=%qscan(&_input1, 2);

proc cas;
    source createColumns;
			if cmiss(movave&mov_minutes1., movstd&mov_minutes1.)=0 and movstd&mov_minutes1. ^=0 then 
			z_mov&mov_minutes1.=(measurement-movave&mov_minutes1.)/movstd&mov_minutes1.;
			
			if cmiss(movave&mov_minutes2., movstd&mov_minutes2.)=0 and movstd&mov_minutes2. ^=0 then 
			z_mov&mov_minutes2.=(measurement-movave&mov_minutes2.)/movstd&mov_minutes2.;
			
			if cmiss(movave3_batch, movstd3_batch)=0 and movstd3_batch ^=0 then 
			z_mov3_batch=(measurement-movave3_batch)/movstd3_batch;
    endsource;
 
    productsTbl = {name = "X_ATGU_ICS_PX10_PROC_TR", 
                   caslib = "public", where="datepart(datetime)='26feb2025'd", 
                   computedVarsProgram = createColumns
    }; 
 
    table.fetch / table = productsTbl to=100;
quit;
1 ACCEPTED SOLUTION

Accepted Solutions
acordes
Rhodochrosite | Level 12

I've found the culprit for the slow performance. 

One of the columns had a length of 32167 ! despite only holding values at length 21 as maximum. 

Now I've changed to varchar in a previous data step where I create the cloumn.

Thanks for your help. 

View solution in original post

5 REPLIES 5
Patrick
Opal | Level 21

>The attached code executes quickly but do not persist the calculated columns in the cas table.

Which persistence do you need? Session or longer?

 

I can't test it but yes, based on documentation the alterTable action doesn't allow to add columns to CAS tables.

You could use the table.view action to create a view with the additional calculated columns you need. 

 

But then about performance:

Is the priority on the time that it takes to change the "table" and add more columns or is it on query performance for the existing table? If it's a view over a single table just adding some calculated columns then I'd assume the query performance impact as compared to a cas table isn't too severe - unless one then uses calculated columns for sub-setting data.

As I see it: CAS is great as data endpoint creating a table for analytics. It's not really meant for "ETL/Data management" processes which should be done upstream before loading into CAS (and in doing so you also will have your physical data in sync with the in-memory table).

acordes
Rhodochrosite | Level 12

Hi @Patrick .

I didn't care so much about performance before, because my ETL and feature engineering tasks were scheduled once a day and it was not so relevant if it took 20 minutes or 5.

But now we're facing a project with nearly live data thus requiring an update schedule each 15 minutes or so. Reports will consume the final tables. 

That's the reason why I'm interested in choosing coding practices that are efficient and release time savings. 

I think I should choose the save action because it allows for calculated columns. I'll give it a try. In one way or another, I have to save the cas tables to sashdat in order to ensure their availability and recovery for the reports. 

 

The attached code from my initial post only displays the calculated columns with a fetch action. And I wonder if a can add them to the cas table...

acordes
Rhodochrosite | Level 12

I've found the culprit for the slow performance. 

One of the columns had a length of 32167 ! despite only holding values at length 21 as maximum. 

Now I've changed to varchar in a previous data step where I create the cloumn.

Thanks for your help. 

Patrick
Opal | Level 21

Yep, 1 million rows is not that much unless you've got such massive columns. 
You might also want to investigate if server side parallel loading is possible. Parallel Data I/O

If you need to add or update rows every 15 minutes then look into actions table.append and table.update 

table.save allows you to save the CAS table to it's physical leg which can be .sashdat. sashdat is memory mapped so I believe you need the table first in CAS.

 

You will certainly have to think about how to "update" your CAS table if you need to run this in 15 minutes batches so there are no (or only very minimal) outages for using the data in CAS.

acordes
Rhodochrosite | Level 12

I already use the append action, it works fine. 

Only for sorting related calculations like first, last, lag, dif I use single mode, in all other cases it runs multi-threaded I think.

Now with the delta load + ETL + append approach it runs fast. I have no worries about a 15 min frequency update.  

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1197 views
  • 1 like
  • 2 in conversation