BookmarkSubscribeRSS Feed
kds
Calcite | Level 5 kds
Calcite | Level 5

Hi SAS Experts,

I have a data step:

data x_final(sortedby=c1 c2 c3 c4 index=(d1 c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12 c13 c14 c15 c16 c17));
     set x_updated;
     by c1 c2 c3 c4;
run;

This data step takes around 3 hrs if x_updated has around 20 million records. If x_updated has around 0.25 million records, then data step takes only 5 minutes to complete.
My question to experts is that ->
Is there any way by which I can reduce execution hours even when source dataset has greater than or equal to 20 million records? 

Thanks

2 REPLIES 2
LinusH
Tourmaline | Level 20

It often comes down to how you are setting up your data model. No clues are gven here about user requirements, search patterns, update frequency etc.

Physically, do you really need so  many indexes? Do they all have at least more that 10 distinct values with normal distribution?

Second, if x_final (specified as work table here, but it can't be, right?) is in a SAS Base engine, it will perform much better if you apply the indexes after the table creation.

Consider to use a SPDE libname instead, it will update the indexes in parallel, using multi-threading.

Do you need to recreate the whole table each time? Consider a append/insert approach.

Data never sleeps
Astounding
PROC Star

Well, it's always possible that your actual DATA step is more complex than the one you have shown here.  But looking at just this code, you don't really need a DATA step at all.  The observations and data values in X_FINAL are identical to X_UPDATED.  You could just use X_UPDATED and build your indexes (hate that word!) using a different tool that doesn't need to read in each observation individually the way a DATA step does.

Looking at the number of variables for which you are creating an index, it concerns me that the programs that use the indexes might be less efficient as well.  But you haven't posted any of that code so it's difficult to judge.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 530 views
  • 0 likes
  • 3 in conversation