BookmarkSubscribeRSS Feed
VVDR
Obsidian | Level 7

HI All,

I am working on Performance tuning in one of the SAS DI job. And, I need your help / suggestions on the below.

My Target_table is an Oracle Table.

Target_table has Indexes for those columns that are being used in the Surrogate Key gen transformation.

Target_table has 6 billion records. 
And the execution time is more than 25 minutes, sometimes it is even crossing 35 minutes. 

Can you please help me in finding the best to tune the performance and to bring the execution time to a minimum.

 

Regards

Rams

 

 

 

9 REPLIES 9
Patrick
Opal | Level 21

@VVDR 

"Target_table has Indexes for those columns that are being used in the Surrogate Key gen transformation."

What sort of surrogate key are you creating? Is that a hash key or just a sequence number? If it's a sequence number then do you need a retained key or just a new key for every record you insert?

Once you've got the key: How do you load? And how many records do you need to load into this 6 billion target table?

Are changes to the Oracle table an option? Or are you at least allowed to create stuff in Oracle (like a sequence object and staging tables)?

 

May be share a picture of the relevant part of your flow as well as the details how you've configured the Surrogate Key generator (or post the code and log created by this transformation so we can understand where processing happens).

 

VVDR
Obsidian | Level 7

HI,

@Patrick, to answer your questions, please allow me some more time. Meanwhile, I can explain my understandings. 

This Surrogate Key Gen Transformation- Generates a unique key value for a set of business key values and assigns maxKey to be last surrogate key value assigned, or one less than what next value should be. 

When I went through the code in Surrogate key transformation, finally it is doing the loading of Surrogate key into a maxKey value and then to a SAS Data set variable. And also I want to create an incremental value to this maxKey(SAS variable).

Please suggest if my understanding is correct ?

 

Regards

Rams 

Patrick
Opal | Level 21

@VVDR 

The details will make all the difference and that's why you need to inspect the code and especially the SAS log.

What can have a huge performance impact are data movements between Oracle and SAS - so you want to make sure to minimize such movements.

 

For your investigation and tweaking: I would add the following options to the pre-code of the transformation as this will give you in the SAS log the information where execution happens.

OPTIONS SASTRACE=',,,d' SASTRACELOC=SASLOG NOSTSUFFIX;

 

To give you one example: IF you select to determine the max value from the target table then SAS will issue syntax which will highly likely pull a lot of data into SAS. So here you need to maintain this Max value in another table (a table with 1 row and 1 column).

Capture.JPG

 

I'm personally not a big fan of the surrogate key transformation and rarely use it. If you just need to load an Oracle table with an auto-increment key then I'd eventually would let Oracle do the job. Either by using a column of type Identity or by using an Oracle sequence object with an after insert trigger.

If you also need to add these newly generated keys to another table as foreign keys then of course the story becomes a bit more difficult. That's why I've asked for a bit more information how the process you need to tweak looks like. If it's tuning then I guess you need to go for a change which doesn't impact other flows.

 

LinusH
Tourmaline | Level 20

The options given by @Patrick is a good starting point.

One can question the need for a surrogate key in tables with 6 billion records. As Patrick is asking, what is the use for this surrogate key throughout the model/system?

 

Since I'm from time to time populate data vault table structures, the Surrogate Key generater comes in handy when updating Hub tables. One benefit for the SKG over DBMS key generation is that you get the business key look-up OOTB (of course, you can solve it using a standard Look-up transformation instead if you like).

 

Since the SGK is a Generated transform, it's quite easy to change the underlying code. And I did just that, and adding an option not to forward any records that already have a SK (can't understand why this options wasn't there from the start).

Data never sleeps
Patrick
Opal | Level 21

Since I'm from time to time populate data vault table structures, the Surrogate Key generater comes in handy when updating Hub tables.


@LinusH 

One of the things I really like about Vault 2.0: It uses hash keys constructed from business keys and though removes the need to look up surrogate keys over business keys.

VVDR
Obsidian | Level 7

Hi,

Can I request you for more information on Vault 2.0.

Thanks

Rams129

Patrick
Opal | Level 21

@VVDR wrote:

Hi,

Can I request you for more information on Vault 2.0.

Thanks

Rams129


 

If you want to get into Data Vault 2.0 then best read:

Building a Scalable Data Warehouse with Data Vault 2.0
by Daniel Linstedt, Michael Olschimke

LinusH
Tourmaline | Level 20

You have a point there, I'm referring to DV 1.0 style with surrogate keys.

Agree that hash keys are convenient, but that is the case with nautraul buisness keys as well (in many cases at least).

I guess we could discuss and elaborate around what type of keys to use for, if not weeks, at least days 🙂

Data never sleeps
Patrick
Opal | Level 21

@LinusH 

LOL -  Weeks with ease. Until we're worn out.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 2205 views
  • 5 likes
  • 3 in conversation