SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Using SCD Type 2 and Surrogate keys for single table in job

Reply
Occasional Contributor
Posts: 19

Using SCD Type 2 and Surrogate keys for single table in job

Hi Team,

               I have an input dataset in the format as below :

ItemIDItemnameitemtypetypenameitemgroupgroupname
1ABCXXXXXXXXXXXA123XYZ
2DEFXXXXXXXXXXXA123XYZ
3GHIXXXXXXXXXXXB456ZZZ

I need to know the process where I need to use the SCD type 2 retained key for my primary key which is Item id and surrogate keys for type and group accordingly.How should I process with in a single job.I'm new to DI and not able to get the exact flow here.

Also,SCD type 2 sometimes gives an error saying "The variable in the DROP, KEEP, or RENAME list has never been referenced".Please help me.

Expected output is

itemrkitemiditemnametyperktypetypenamegrouprkgroupgroupname
11abc1xxxxxxxxxxx1a123xyz
22def1xxxxxxxxxxx1a123xyz
33ghi1xxxxxxxxxxx2b456zzz
Super User
Posts: 5,254

Re: Using SCD Type 2 and Surrogate keys for single table in job

"The variable in the DROP, KEEP, or RENAME list has never been referenced": you give us nothing to work on here - what mapping and options have you set in the transform, are your target table and meta data registration equal?


About your expected (or rather desired?). SCD Type 2 Loader is able to generate one surrogate key per target table. It looks like you are trying do design a snow flake here. As a general advice, don't use snow flakes. If you still wish to have RK for your grouping levels - why do you have the name columns in the same table?

So still, to create RK for each grouping, have each group stored in separate dimensions (snow flakes), and then do a look up to populate the Item dimension.

Data never sleeps
Occasional Contributor
Posts: 19

Re: Using SCD Type 2 and Surrogate keys for single table in job

Hi LinusH,

              My bad!I forgot to mention that I have two more columns "valid from date" and "valid to date" in my target structure.I was using scd type loader and selected these 2 dates for change data capture along with the one retained key  as you said,trying to map the other columns.So you mean to say that I need to have different tables for each of them and the get the keys from them.Will try it out.Could you please highlight on that look up process as I don't have much idea on how it works.

Regards,

Maddy

Super User
Posts: 5,254

Re: Using SCD Type 2 and Surrogate keys for single table in job

Take a look at the Look-up transform.

Still, I can't see what practical use you (or your users) will have of the grouping RK columns.

Data never sleeps
Respected Advisor
Posts: 3,887

Re: Using SCD Type 2 and Surrogate keys for single table in job

Before you start designing & implementing DI jobs you should get your target data model right. So how is it in your case?

- 1 item belongs to exactly 1 type, a type has many items

- 1 type belongs to exactly 1 group, a group has many types

In such a case you could create a dimension table an use ItemID as business key (and using SCD2 there would then be a single ItemRK). No need to try and create additional RK's.

You could also decide to split up things and create 3 tables: A Item table, a Type_Ref table and a Group_Ref table. You still could load these 3 tables using SCD2 (and then you get 3 RK's) In such a case you would first load the reference tables (Type_Ref, Group_Ref) and then look-up the keys before loading Type.

...or if "1 type belongs to exactly 1 group, a group has many types" then you could also create a single reference table with typeID as business key when loading using SCD2

Occasional Contributor
Posts: 19

Re: Using SCD Type 2 and Surrogate keys for single table in job

Patrick,

            In my case the hierarchy is something as below :

Itemtype (highest)

followed by group

followed by item (lowest)

So,there can be many groups under one type and

many items under one group.Here item id is my business key which is unique but in char format.So i want to use the keys for increasing the processing speed.What I understood from above comments is that,

i have 3 tables first

1. type (rk for type)

2.group (rk for group and lokup to type)

3. item (rk for item and look up to type & group)

Is it correct?

Respected Advisor
Posts: 3,887

Re: Using SCD Type 2 and Surrogate keys for single table in job

If it's a true hierarchy then it's your design decision how you want to organize the data.

In one table it could be:  {Item_RK, ItemID, groupID, itemtypeID,....}

When loading using SCD2 then Item_Id would be the business key, all the other columns would be under change tracking.

With 3 tables:

- ItemType_Ref {ItemType_RK, ItemTypeID,....}

- Group_Ref {Group_RK, GroupID,...}

- Item {Item_RK, ItemID, Group_RK, ItemType_RK....}

If you go for a 3 table approach then you would first load ItemType_Ref and Group_Ref and then look up the RK's via the ID's before loading table Item.

If you're having full loads then I would go for a 1 table approach, if it's delta then I would eventually go for a 3 table approach.

The reason for 3 tables in case of delta loads: If something like the Group_Description changes then in a 3 table approach this would simply cause a new current record and all is set, in a one table approach you would have to update all records in the table with the new description (with a full load this would happen "automatically").

As a database purist one would say that you always need to generate keys. From a more pragmatic perspective it might well be o.k. to use the ID's as provided from source and not generate RK's at all. In such a case you wouldn't need to look up the RK's but you would simply use the ID's in table "Item".

Ask a Question
Discussion stats
  • 6 replies
  • 1625 views
  • 6 likes
  • 3 in conversation