SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Which transformation should use to get only the updated records in my output?

Reply
Regular Contributor
Posts: 168

Which transformation should use to get only the updated records in my output?

I've a SAS table like below which will update periodically. Here i wish  to use some transformation to capture only the updated records in my output.

Id Name Salary Address

1 Ram  1000    aaaaa

2 Ravi   2000    bbbbb

3 Kiran  1500   ddddd

After some point of time  this table will be updated as below by clients if we have any changes over the records.

Id Name Salary Address

1 Ram  2000    aaaaa

2 Ravi   2000    bbbbb

3 Kiran  1500   ccccc

So I need the output as below via SAS transformation for the changed records.

Id Name Salary Address

1 Ram  2000    aaaaa

3 Kiran  1500   ccccc

I've tried to achieve this via Table loader transformation with update\insert technique, but it is providing the output by holding the old records (like append transformation).

Can someone suggest the suitable transformation with the detail steps to achieve this?

Valued Guide
Posts: 3,208

Re: Which transformation should use to get only the updated records in my output?

think on adding date (date/time source) to the records as some information point.

I do not know what modeling method has been chosen. With a Data Vault Modeling - Wikipedia, the free encyclopedia approach is a principle to store that kind of info along all other record fields.  

---->-- ja karman --<-----
Regular Contributor
Posts: 168

Re: Which transformation should use to get only the updated records in my output?

Hi Jaap Karman,

Thanks for your response. But I'm wondering for transformations like CDC, SCD type-1 to achieve this feat rather than your suggestion. Let me know if you required any further details from me.

Super User
Posts: 9,671

Re: Which transformation should use to get only the updated records in my output?

Do you have a copy of the original dataset before it be updated ?

Maybe you should check Audit Trail  .

Xia Keshan

Regular Contributor
Posts: 168

Re: Which transformation should use to get only the updated records in my output?

Hi Ksharp,

I don't have the copy of the original dataset. If I'm not wrong, audit trail table will hold only the details like number of rows processed and not the output which I expect.

Super User
Posts: 9,671

Re: Which transformation should use to get only the updated records in my output?

If I remember correctly , Audit Trail also will hold the updated information. Check the documentation . I don't do it for a long time .

Good Luck .

Respected Advisor
Posts: 3,887

Re: Which transformation should use to get only the updated records in my output?

Even though the DIS loaders work out internally which records need update and which insert this information is only kept in work tables which get cleaned out at the end of the loader transformation. I'm not aware of any loader option which would allow you to retain these tables.

The 2 ways I can currently think of to get you what you want (by using DIS and avoid too much custom code):

1) Use an SCD2 loader as this will maintain full change history in your table. The latest updated records will be the once with more than one record per Retained Key and the latest Changed Date. The SCD2 loader is "out-of-the-box" but you would still need to implement some kind of reporting for record selection and if having a type 2 table is not a requirement then you would probably also want to have some housekeeping process which in the end removes all history records (once you've reported on them).

To make it easy for users of your data: You probably would implement a view over the type 2 table which selects only the current records so your users don't get exposed to a type 2 table.

2) You turn on an audit trail and then report on this audit trail as suggests. To implement the audit trail is a one-off task (that sticks with the physical table then). You still would need to implement some reporting and some house keeping to truncate the audit trail from time to time so that it doesn't "blow up" too much.

Both using a type 2 loader and an audit trail will have a negative impact on performance.

Valued Guide
Posts: 3,208

Re: Which transformation should use to get only the updated records in my output?

No matter what you do. Do not start coding before understanding concepts.

CDC  Change data capture - Wikipedia, the free encyclopedia is just passing updates inserts and deletes. The disadvantage is there can failures in the process of synchronization as it is partial synchronization.

SCD-1 is one the Slowly changing dimension - Wikipedia, the free encyclopedia where data collection approach for building cubes. The type-1  one is a simple one but often not applicable as of the meaning of the changes.

Knowing your desing of the dwh building the processes you will see wich transformations are needed (Join merge set etc.)

The auditt trail of the original DBMS could be the source for a CDC approach.

---->-- ja karman --<-----
Ask a Question
Discussion stats
  • 7 replies
  • 513 views
  • 0 likes
  • 4 in conversation