- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have a large dataset that gets refreshed each day. Currently the data is completely reloaded each day, however going forward I would like to keep a record of changing records and new records. I'd also like to have a history table that I could query against using an "as of" date. I do not have data integration studio, but I suspect there is a clever way to track data updates/new inserts.
Example of what two days of data might look like:
data a;
input name $ dept $ date date9.;
format date date9.;
datalines;
John Sales 01JAN2018
Mary Acctng 01JAN2018
;
data b;
input name $ dept $ date date9.;
format date date9.;
datalines;
John Sales 02JAN2018
Mary Finance 02JAN2018
Jake Finance 02JAN2018
;
I'd like to be able to track these changes that occurred on dataset b. For example, I'd like to tell that on 1/2/2018 Mary switched to Finance, and Jake was a new record. I've tried a proc compare with outall option, but I'm not sure if it will allow me to track the changes and query the results.
I could simply stack dataset A and B on top of each other but in my case space is an issue and I would not like to save records that are not changing.
Any ideas?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Have you looked audit features enabled through PROC DATASETS?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have not, but I will google it right now. Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Do I understand correctly that your table that today is refreshed, you would change the update method to only modify/add changed/new records? I just want to be sure, since change data capture normally describes how to fetch updats/inserts at the source, not the target. Manage target tables in more "intelligent" way is called historization, or SCD Type 2.
If the later, here SAS Data Integration Studio is nice tool if you are building a data warehouse or similar.
Data set audit is nice feature, but I would say that is more for audit (of data management itself), and usually works best with small/moderate data sizes.
If you want to track changes, and want to "easily" query it, SCD Type 2 sounds more what you need.
Suggested approach:
1. Inner join/lookup daily transactions with target table.
2. Compare columns values to see if there is any change.
3. If yes trigger an insert with a date/timestamp
4. Insert records that didn't get a match in the initial join
Some SCD Type 2 models also use end-dating of the record that is being updated (have anew veriosn inserted). This causes some more cost in the data loading aprt, but eases the query (you could use WHERE date BETWEEN StartDate AND EndDate), and typically a bit more performant than without it (for SAS data stores at least).