Re: Using SCD Type 2 and Surrogate keys for single table in job

maddysbi · Posted 07-19-2014 08:22 AM

Hi Team,

I have an input dataset in the format as below :

ItemID	Itemname	itemtype	typename	itemgroup	groupname
1	ABC	XXX	XXXXXXXX	A123	XYZ
2	DEF	XXX	XXXXXXXX	A123	XYZ
3	GHI	XXX	XXXXXXXX	B456	ZZZ

I need to know the process where I need to use the SCD type 2 retained key for my primary key which is Item id and surrogate keys for type and group accordingly.How should I process with in a single job.I'm new to DI and not able to get the exact flow here.

Also,SCD type 2 sometimes gives an error saying "The variable in the DROP, KEEP, or RENAME list has never been referenced".Please help me.

Expected output is

itemrk	itemid	itemname	typerk	type	typename	grouprk	group	groupname
1	1	abc	1	xxx	xxxxxxxx	1	a123	xyz
2	2	def	1	xxx	xxxxxxxx	1	a123	xyz
3	3	ghi	1	xxx	xxxxxxxx	2	b456	zzz

LinusH · Posted 07-19-2014 01:16 PM

"The variable in the DROP, KEEP, or RENAME list has never been referenced": you give us nothing to work on here - what mapping and options have you set in the transform, are your target table and meta data registration equal?

About your expected (or rather desired?). SCD Type 2 Loader is able to generate one surrogate key per target table. It looks like you are trying do design a snow flake here. As a general advice, don't use snow flakes. If you still wish to have RK for your grouping levels - why do you have the name columns in the same table?

So still, to create RK for each grouping, have each group stored in separate dimensions (snow flakes), and then do a look up to populate the Item dimension.

Data never sleeps

maddysbi · Posted 07-19-2014 01:39 PM

Hi LinusH,

My bad!I forgot to mention that I have two more columns "valid from date" and "valid to date" in my target structure.I was using scd type loader and selected these 2 dates for change data capture along with the one retained key as you said,trying to map the other columns.So you mean to say that I need to have different tables for each of them and the get the keys from them.Will try it out.Could you please highlight on that look up process as I don't have much idea on how it works.

Regards,

Maddy

LinusH · Posted 07-19-2014 07:03 PM

Take a look at the Look-up transform.

Still, I can't see what practical use you (or your users) will have of the grouping RK columns.

Data never sleeps

Patrick · Posted 07-20-2014 12:20 AM

Before you start designing & implementing DI jobs you should get your target data model right. So how is it in your case?

- 1 item belongs to exactly 1 type, a type has many items

- 1 type belongs to exactly 1 group, a group has many types

In such a case you could create a dimension table an use ItemID as business key (and using SCD2 there would then be a single ItemRK). No need to try and create additional RK's.

You could also decide to split up things and create 3 tables: A Item table, a Type_Ref table and a Group_Ref table. You still could load these 3 tables using SCD2 (and then you get 3 RK's) In such a case you would first load the reference tables (Type_Ref, Group_Ref) and then look-up the keys before loading Type.

...or if "1 type belongs to exactly 1 group, a group has many types" then you could also create a single reference table with typeID as business key when loading using SCD2

maddysbi · Posted 07-20-2014 01:30 AM

Patrick,

In my case the hierarchy is something as below :

Itemtype (highest)

followed by group

followed by item (lowest)

So,there can be many groups under one type and

many items under one group.Here item id is my business key which is unique but in char format.So i want to use the keys for increasing the processing speed.What I understood from above comments is that,

i have 3 tables first

1. type (rk for type)

2.group (rk for group and lokup to type)

3. item (rk for item and look up to type & group)

Is it correct?

Patrick · Posted 07-20-2014 03:12 AM

If it's a true hierarchy then it's your design decision how you want to organize the data.

In one table it could be: {Item_RK, ItemID, groupID, itemtypeID,....}

When loading using SCD2 then Item_Id would be the business key, all the other columns would be under change tracking.

With 3 tables:

- ItemType_Ref {ItemType_RK, ItemTypeID,....}

- Group_Ref {Group_RK, GroupID,...}

- Item {Item_RK, ItemID, Group_RK, ItemType_RK....}

If you go for a 3 table approach then you would first load ItemType_Ref and Group_Ref and then look up the RK's via the ID's before loading table Item.

If you're having full loads then I would go for a 1 table approach, if it's delta then I would eventually go for a 3 table approach.

The reason for 3 tables in case of delta loads: If something like the Group_Description changes then in a 3 table approach this would simply cause a new current record and all is set, in a one table approach you would have to update all records in the table with the new description (with a full load this would happen "automatically").

As a database purist one would say that you always need to generate keys. From a more pragmatic perspective it might well be o.k. to use the ID's as provided from source and not generate RK's at all. In such a case you wouldn't need to look up the RK's but you would simply use the ID's in table "Item".

Using SCD Type 2 and Surrogate keys for single table in job