SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
mdavidson
Quartz | Level 8

I have a large dataset that gets refreshed each day. Currently the data is completely reloaded each day, however going forward I would like to keep a record of changing records and new records. I'd also like to have a history table that I could query against using an "as of" date. I do not have data integration studio, but I suspect there is a clever way to track data updates/new inserts.

 

Example of what two days of data might look like:

data a;
   input name $ dept $ date date9.;
   format date date9.;
   datalines;
John Sales 01JAN2018
Mary Acctng 01JAN2018
;

data b;
   input name $ dept $ date date9.;
   format date date9.;
   datalines;
John Sales 02JAN2018
Mary Finance 02JAN2018
Jake Finance 02JAN2018
;

I'd like to be able to track these changes that occurred on dataset b. For example, I'd like to tell that on 1/2/2018 Mary switched to Finance, and Jake was a new record. I've tried a proc compare with outall option, but I'm not sure if it will allow me to track the changes and query the results.

 

I could simply stack dataset A and B on top of each other but in my case space is an issue and I would not like to save records that are not changing.

 

Any ideas?

3 REPLIES 3
data_null__
Jade | Level 19

Have you looked audit features enabled through PROC DATASETS?

mdavidson
Quartz | Level 8

I have not, but I will google it right now. Thanks!

LinusH
Tourmaline | Level 20

Do I understand correctly that your table that today is refreshed, you would change the update method to only modify/add  changed/new records? I just want to be sure, since change data capture normally describes how to fetch updats/inserts at the source, not the target. Manage target tables in more "intelligent" way is called historization, or SCD Type 2.

If the later, here SAS Data Integration Studio is nice tool if you are building a data warehouse or similar.

 

Data set audit is nice feature, but I would say that is more for audit (of data management itself), and usually works best with small/moderate data sizes.

 

If you want to track changes, and want to "easily" query it, SCD Type 2 sounds more what you need.

Suggested approach:

1. Inner join/lookup daily transactions with target table.

2. Compare columns values to see if there is any change.

3. If yes trigger an insert with a date/timestamp

4. Insert records that didn't get a match in the initial join

 

Some SCD Type 2 models also use end-dating of the record that is being updated (have anew veriosn inserted). This causes some more cost in the data loading aprt, but eases the query (you could use WHERE date BETWEEN StartDate AND EndDate), and typically a bit more performant than without it (for SAS data stores at least).

Data never sleeps

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 2330 views
  • 0 likes
  • 3 in conversation