BookmarkSubscribeRSS Feed
Anindya
Calcite | Level 5

Friends,

 

I have created a job using SCD Type 2 loader and enable the option of closing out records if not available in source. Now the problem is as follows - 

 

Say a record got closed on 02JAN2019 and the valid_to_dttm is 01JAN2019:23:59:59. Now when I am running the job for 03JAN2019, again the same record is getting closed on 02JAN2019:23:59:59. I tried to find the reason in SCD Type 2 code and observed the following piece of code in change digest -  

 

by FINANCIAL_INSTRUMENT_ID VALID_TO_DTTM;
if last.FINANCIAL_INSTRUMENT_ID;

 

Now the last.financial_instrument_id column will always be having the latest record (even if the record got closed in the last run). Thus the closed account is getting considered in the current run and the same is again getting closed. Manual changes have been made in the code to get the expected result. But is there any way in SCD Type 2 configuration to get it corrected?

 

 

9 REPLIES 9
Patrick
Opal | Level 21

@Anindya 

I've used option "close out records not in source table" already many times without any issues. I would have bet that this always works - and lost!

I can confirm and replicate what you observe. You can resolve the issue by adding a current indicator column.

 

Are you going to contact SAS TechSupport? This looks to me like a bug which requires fixing so if you're not telling SAS "officially" then I will.

 

Details

I've done some testing and what I found: If you also define a current indicator column under change tracking (which I always do out of habit) then the DIS transformation generates code which behaves as you would expect.

Capture.JPG

If there is a current indicator column defined then the SCD2 generated code only selects "active" records from the target table.

 

   /* Sort target table records before creating the temporary cross reference  */ 
   /*  table:                                                                  */ 
   proc sort data = t4BASE.Target(where = (change_current_ind = 'Y'))
      out = work.etls_sortedxref(keep=FINANCIAL_INSTRUMENT_ID VALID_FROM_DTTM VALID_TO_DTTM change_current_ind  
       OTHER_VAR);
      by FINANCIAL_INSTRUMENT_ID;
   run;

And then in the step where records for expiration get determined there is nothing coming from source nor target so no output record for closing gets created (which is what you want as such records have been closed already in earlier runs).

 

      
      /* Close out any records that are in the xref table but not in the source.  */ 
      else if inXref and not inSort then
         do;
            ETLS_CLSDATE = DATETIME();
            output work.etls_close;
         end;

Now if you don't have a change current indicator column defined then the SCD2 generated code looks like:

   /* Sort target table records before creating the temporary cross reference  */ 
   /*  table:                                                                  */ 
   proc sort data = t13BASE.Target
      out = work.etls_sortedxref(keep=FINANCIAL_INSTRUMENT_ID VALID_FROM_DTTM VALID_TO_DTTM  OTHER_VAR);
      by FINANCIAL_INSTRUMENT_ID VALID_TO_DTTM;
   run;

So here ALL records get selected and though if not existing in source these records get expired over and over again. This is not only inefficient but it's also wrong as the expiration end date gets changed to a date later than when the record actually expired.

 

I believe that for the SCD2 loader to behave correctly even without a change current indicator, the generated code would need to include a selection on current records only - where = (VALID_TO_DTTM = &etls_enddate)

   /* Sort target table records before creating the temporary cross reference  */ 
   /*  table:                                                                  */ 
   proc sort data = t4BASE.Target(where = (VALID_TO_DTTM = &etls_enddate))
      out = work.etls_sortedxref(keep=FINANCIAL_INSTRUMENT_ID VALID_FROM_DTTM VALID_TO_DTTM change_current_ind  
       OTHER_VAR);
      by FINANCIAL_INSTRUMENT_ID;
   run;

 

And just for completion - and something to share with SAS TechSupport - here also attaching the DIS 4.903 job I've created for testing.

 

Objects in .spk

Capture3.JPG

SASWORK physical path (points to WORK so should work in any environment)

Capture4.JPG

 

To replicate my testing: After first execution out of DIS don't close job but add below definition, then execute again

Capture2.JPG

 

Result after two executions

Capture.JPG

The test job is set-up so that there is a gap of 1 second per iteration (using call sleep).

There are 4 iterations per job execution (4 SCD2 loads of data with the same business key).

1. Brand new record

2. Change record (same value for business key but changed value for type2 tracking variable "other_var")

3. No records in source table

4. No records in source table

 

Test result

When there is no change current indicator defined then the value for VALID_TO_DTTM gets changed every single time for  records in target but not in source. This is considered to be an undesired behavior.

When there is a change current indicator column defined then records get only expired once and later runs don't change the valid_to_dttm value anymore. This is considered to be the correct behavior.

ChrisLysholm
SAS Employee

Hello,

 

Please see this SAS Usage Note for details: 59083

 

The SCD Type 2 Loader does not enforce the use of a current indicator although inclusion of this administrative column into the dimension table is considered a recommended practice.

 

As we do not require a current indicator, I do believe this issue qualifies as a defect and I will enter this into our defect system.

 

 

 

ISC_Shuster
SAS Employee

I'm working on a table with no current indicator and it is not permited to alter the table estructure.
I need to use option "Close records not in source table" and I feel worried becouse this bug has been opened since 2016-10-24.

Please Help.

Patrick
Opal | Level 21

@ISC_Shuster 

Nothing changed as far as I know so what already has been discussed here are your options. Using the out of the box SCD2 loader without current indicator column is not one of them.

ErikLund_Jensen
Rhodochrosite | Level 12

Hi @Anindya 

 

The following is probably not the solution you are looking for, but it worked for us, so take it as a "shared experience".

 

We have used the scd2 loader since 2007, and it has given us a lot of trouble. Over the years we had accumulated many jobs with scd2-loaders, and every now and then something failed, or we discovered corrupt data. Sometimes the options were set wrong for the task at hand, which is difficult to prevent in a setup with many ETL-developers working in several locations, but sometimes the loader seemed to start producing wrong results for no appearent reason. I always had a suspicion that some problems were caused by migration to new SAS versions, where our SAS consultants used SMU migration + bulk deployment, but my suspicion was never confirmed.

 

But no matter what, we found the scd2 loader too complicated to use. There are too many different loader method crammed into one transformation, and the option panes are not intuitive, so 2 years ago we decided to get rid of the problems by replacing the scd2-loader with User Transformations. An analysis showed that wee needed two different transformations: One to replace all records with the same set of keys, and one to stop existing records or add new records with all variables except start/stop-dates as key.

 

So we built those transformations, put mild pressure on the ETL-developers to replace their loaders, and haven't had troubles with changing dimensions since.

 

 

Patrick
Opal | Level 21

Hi @Anindya 

I have to second what @ErikLund_Jensen writes.

I can't recall that I've had issues with the SCD2 transformation loading data wrongly since quite a few DIS dot version now BUT I did face many challenges with performance. 

Basically: Unless it's about dealing with low to mid range data volumes with both source and target being SAS tables, I always use custom transformations. Especially when it comes to interfacing with databases the SCD2 loader generated SAS code creates a lot of unnecessary data movement between the database and SAS.

Anindya
Calcite | Level 5

Hi,

 

Even I have faced a lot of issues while configuring SCD type 2 in SAS DI. Currently I have followed the suggestions given by @Patrick.

I have written a piece of code which will update the deployed jobs at run time with the required changes and it works fine. 

This is seems to be a bug of SCD Type 2, I will try to raise it with SAS Tech Support and get it resolved, if possible. Thank you guys for your support. 

Srikanth713
Calcite | Level 5
I am facing some problm with scd type 2 on close out records tht is some records are closed Target table since records are not coming from the source and after few days when the same record come it's creating the new RK value for same record and opening.how can I do this to pick the same RK value of previous, those who knew it please let me know Thanks

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 3402 views
  • 5 likes
  • 6 in conversation