BookmarkSubscribeRSS Feed
ScottBass
Rhodochrosite | Level 12

Hi,

There's some fairly well-known code "out there" for processing Slowly Changing Dimension Type 2 (SCD2) updates.  Anyone who has used SAS DI Studio would be familiar with this transformation.

In skimming this code (note: I haven't done a full in-depth analysis yet), it appears to me that the same processing could be accomplished using hash objects resulting in faster performance, especially with respect to not requiring a sort of the target table. 

Assume that the transaction data will fit in memory.

If my supposition is correct, then before I reinvent a perfectly round wheel (i.e. the wheel you're willing to share 😉 ), has anyone done this already and, if so, can you share your code?

Regards,

Scott


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.
2 REPLIES 2
Patrick
Opal | Level 21

I had in the past twice to write my own SCD 2 loader. Once when the DIS 3.4 loader wasn't fit for purpose and once when the client didn't have DIS licenced.

In the second case I've based my code on DIS 4.2 generate code. The code I've implemented was for a full load from source expiring all data in target not in source. By stripping down the DIS 4.2 generated code to my special needs I could get rid of a full pass through the data.

I'm not sure where a sort of the target table is required. You will basically need an XREF table containing the keys and the digest values to determine the records for insert and the ones for update.

I haven't understood yet why a permanent XREF table gets stored in the target environment and not the source environment. From what I could understand if stored in a data base target environment then it's fast for a limited number of change records but sub-optimal if a lot of records change (because of moving data between the DB and SAS).

So yes: If this is about performance gains then coding your own SCD 2 loader for exactly your case will likely allow you to better performance. BUT: It comes with the price of increased implementation effort and decreased flexibility and maintainability.

I'm currently facing a challenge where the DIS SCD 2 loader can't be used. You'll probably see my "request for ideas" early next year.

ScottBass
Rhodochrosite | Level 12

Hi Patrick,

I thought about this a bit more:  Using a hash object would work when closing out an old record (i.e. h.find() gets a hit), But it wouldn't work for brand new records; AFAIK I can't retrieve the data from the hash object when there isn't a hit on the key.

Another way of saying this is a hash object join is a "left join", when what I need is a "full outer join" (or a data step merge).

I'll have to look at other ways of optimizing the performance; perhaps SPDE and its automatic support of BY variables, or using an index instead of a full sort of the entire target table.

For my purposes the source and target tables are SAS datasets, so I don't have to consider of SAS vs. DBMS performance.

From the paucity of replies I assume I'll have to code this myself ;-).  I'll look for your post early next year - perhaps we can share ideas Smiley Happy

Regards,

Scott


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 1534 views
  • 0 likes
  • 2 in conversation