SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Did You Know: Creating New Surrogate Key Values from Table Loader

Reply
SAS Employee
Posts: 51

Did You Know: Creating New Surrogate Key Values from Table Loader

All,



If you need to maintain a surrogate key in DI Studio, you have several options including the Type 2 SCD Loader or the Surrogate Key Generator transform. However, in some cases you do not need all the functionality attached to the Type 2 SCD Loader and you may not be able to tolerate another pass against your data to use the Surrogate Key Generator transformation. In this case, the method below may be of interest.



If you're already using the Table Loader transformation in 4.2.x, you may use the following steps to create surrogate key values without another pass against the data:


[pre]
1. Create code like the following in the"Precode" setting for the Table Loader:
PROC SQL noprint;
select max()
into :highValue
from .;
QUIT;

2. Map the expression below to the "Expression" setting for the surrogate key field in
the Mappings tab for the Table Loader:
&highValue + monotonic()
[/pre]



This works as follows:

  1. The PROC SQL query selects the current max value of the surrogate key field in the table into a macro variable (&highValue in this case).
  2. The expression above creates a unique surrogate key value during each iteration of the Table Loader read loop. This works because the monotonic() function returns an ascending value (starting with 1) each time it is called. So, adding this to the current maximum for the key in the table creates unique, ascending values.



    Note that the library for the target table above may need to be pre-assigned, since DI Studio will not have generated the libname statement for the Table Loader target table when the Precode is executed.



    Screen shots showing the two steps above are attached. Please let me know if you have any questions.



    Thanks,



    Tim Stearn


    SAS Product Management
Super User
Posts: 5,256

Re: Did You Know: Creating New Surrogate Key Values from Table Loader

I'm not sure if this is a great idea.
-The monotonic function is not supported by SAS
-The described pre-step does a full table scan (a pass of data) if the table resides in a SAS Base libname. Indexing the surrogate key column won't help. If your data is in SPDS, this would work excellent and extremely fast, since SPDS will query the meta data portion of any existing index. If your data is an external RDBMS, it's likely that you have functionality of generating surrogate keys built in.

An alternative (if you have your target table sorted by the surrogate key) is to use data set functions to retrieve the last record and its surrogate key value. This is also very efficient. And if you don't wish to use an unsupported function, you could just use a data step view and calculate the key values using _n_ and the current max value retrieved earlier.

/Linus
Data never sleeps
SAS Employee
Posts: 51

Re: Did You Know: Creating New Surrogate Key Values from Table Loader

All,

I had been waiting to reply to the question posted in this thread about whether it was safe to use the monotonic() function in the manner described in this article. As is sometimes the case with SAS Notes, it has proved challenging the nail down the exact circumstances in which monotonic() doesn't behave as expected when used in conjunction with PROC SQL.

Suffice it to say, however, that the approach described in the article was successfully tested prior to the post. Furthermore, I've been using a similar method for 3 years to generate serial numbers during the extract process to ensure that all extracted records will be uniquely identified. This I've done across hundreds of jobs with no unexpected results.

So, I feel confident in saying the method I've described is safe within the context of the article. You should not, however, use monotonic() in complex query situations (like subqueries) where you're depending on some kind of underlying record order in the SQL result set. Since this not the case for surrogate key assignment, this approach is valid.

Thanks,

Tim Stearn
Respected Advisor
Posts: 3,890

Re: Did You Know: Creating New Surrogate Key Values from Table Loader

Hi Tim

I strongly agree with Linus.

Even if monotonic() works perfectly for the case you're describing it's still an undocumented and unsupported function. I don't think its usage should be proposed for anything.

If I'm using monotonic() in an implementation and things go wrong what do you think SAS Tech Support would tell me?

I have of course no issues at all with monotonic() if SAS R&D decides to use it for automatically generated code.

But may be even better would be to offer an option to store and maintain the max key in a permanent table - as already done for xref tables in the SCD2 loader.

Cheers
Patrick

And just as a side note: It would of course be very nice to have monotonic() as a supported function because it's sometimes really useful. I admit to be guilty.
The problems with monotonic() I read about where in conjunction with Oracle queries – which was actually kind of hard to understand as Oracle has a similar function row() and I would naively expect that SAS just could translate monotonic() into row() and send it to Oracle for execution.



Message was edited by: Patrick
SAS Employee RMP
SAS Employee
Posts: 52

Re: Did You Know: Creating New Surrogate Key Values from Table Loader

An alternative without using the Precode -


create a column highValue with expression - MAX()
create the sequence column with expression - monotonic() + CALCULATED highValue.
New Contributor
Posts: 4

Re: Did You Know: Creating New Surrogate Key Values from Table Loader

Hi Tim,

Can you pass the code you have for generating this? I am not able to open the attachment and I think this would be very helpful for me.

Ask a Question
Discussion stats
  • 5 replies
  • 2443 views
  • 1 like
  • 5 in conversation