Solved: Re: The use of surrogate keys

asdf_sas · Posted 01-31-2014 08:05 AM

Hi all,

i would like to know which advantages a surrogate key provides in comparison to a normal functional key which depends

on attributes in the data. Therefore i like to know how to use a surrogate key in a professional way.

Do a surrogate key provides advantages in all database schemas or in some special ones like the star or snowflake schema?

Anyone knows what SAS thinks where and when to use a surrogate key?

All information i found on the interweb approaches a surrogate key with a negative attitude of their missing reference to

attributes.

Hope someone can explain the use of such keys to me and the way SAS would use them.

Personally i'm torn between technical keys and functional keys....

Greetings,

S

LinusH · Posted 01-31-2014 09:08 AM

This is a topic that can be described and discussed like forever. There are numerous books on the subject, and threads on i.e. Linkedin have hundreds of posts. So... 🙂

Id in transactions usually doen't need surrogate keys (SK).

The reason to us surrogate keys are many. Just a few that comes to mind:

An identity changes the identity no in the source. Example is an employee quits, and gets a new emloyee id when he/she is returned coule of month later. You can use help tables to match those into a single surrogate key.
Change of source system: customer id in the old system is replaced by a new in the new system. A help table can helt to match the customer to the same SK.
A source id can sometimes be reused. A help table with date intervals can handle that as well, and direct the id to the correct SK
A customer can exist in multiple source systems, with different id:s. This can also be handled by SK management.
SK can be used to avoid the usaga of composite keys.

And to add the physical level, many RDBMS shows better performance on joining on INT keys than with VARCHAR keys.

Data never sleeps

View solution in original post

LinusH · Posted 01-31-2014 09:08 AM

This is a topic that can be described and discussed like forever. There are numerous books on the subject, and threads on i.e. Linkedin have hundreds of posts. So... 🙂

Id in transactions usually doen't need surrogate keys (SK).

The reason to us surrogate keys are many. Just a few that comes to mind:

An identity changes the identity no in the source. Example is an employee quits, and gets a new emloyee id when he/she is returned coule of month later. You can use help tables to match those into a single surrogate key.
Change of source system: customer id in the old system is replaced by a new in the new system. A help table can helt to match the customer to the same SK.
A source id can sometimes be reused. A help table with date intervals can handle that as well, and direct the id to the correct SK
A customer can exist in multiple source systems, with different id:s. This can also be handled by SK management.
SK can be used to avoid the usaga of composite keys.

And to add the physical level, many RDBMS shows better performance on joining on INT keys than with VARCHAR keys.

Data never sleeps

Patrick · Posted 01-31-2014 06:43 PM

From what I've learned a primary key shouldn't have semantic (meaning) and therefore needs to be technical.

The reason for this is that a semantic key could change and that it is very nasty business to change primary key values in a database.

I've seen this once happen in a bank when the internal account numbers got converted to the IBAN standard. The DW at the time used the old account numbers as primary key (instead of a technical key) and it became a very expensive exercise to change over to IBAN.

TomKari · Posted 02-03-2014 06:36 AM

Although I have less experience in this area than Linus and Patrick, I agree with their comments completely.

My workmethod is that when I'm setting up something new, I'll use surrogate keys if it's my decision to make, or I'll try to convince the decision-maker to use them. However, if the decision is to use the actual variable as the key, I don't put up a fuss over it. The posts above list very good reasons for using surrogate keys, but they apply in exceptional cases, not the norm. Therefore, things will probably be fine.

The one exception to that is that you should ensure that whatever data management product you're using performs well with the datatype of your key. This particular lookup operation will be done extensively, so you don't want it to put a load on the machine.

Tom

asdf_sas · Posted 02-03-2014 07:55 AM

Hi all,

thanks for your help.

But i would also like to know what you think of using surrogate keys only in the base table as an identifyer and the relations

are still connected with business keys.

f.e. you have table customer and insurance.

the customer gets an unique identifyer with business meaning like customerid additionally you assign a sk.

the customerid is used as a primary key and a foreign key and the sk is there for "some reasons".

I dont think this design is any usefull but i can really argue against it.In this case i think the sk is completely useless at all.

We have a database which shoud be historicized.

In my oppinion it should be better to use sk instead of the business key surrogate key combination with no use of the sk.

Am I just stupid, unexperienced or missing something?

Thanks for your answers

S

LinusH · Posted 02-03-2014 08:52 AM

Stupid - don't so hard on yourself.. 😉

I don't know if I understand you correctly, you create and maintain SK for customer, but use business key as PK (and FK in relations), that isn't wise - and useless. If you use SK, it shall also be the PK.

Then, how this i s modeled, there are many ways. I favor anchor modeling which I found useful in many scenarios. Espacialy when historize (creating versions of data records).

Data never sleeps

asdf_sas · Posted 02-03-2014 09:01 AM

You understood correctly.

Thats what "they" did for our PoC.

Youre right but i have no arguments to object their model... all of them think this structure will work perfectly but i think it

will fail if it comes to more complex data.... it will be.

I will google about the anchor model...maybe it helps me to prevent further problems in future...

If you have one telling arguement why this model is shit please tell me...Even if i think my superiors will not listen i would like to know

it for myself :smileysilly:

Greetings

LinusH · Posted 02-03-2014 09:08 AM

Without knowing in more detail, the above mentioned arguments for using SK should be considered good enough.

Please return if you have any other/more problems with "they".

Good luck!

Data never sleeps

Ready to join fellow brilliant minds for the SAS Hackathon?