BookmarkSubscribeRSS Feed
mdavidson
Quartz | Level 8

I have a large dataset that gets refreshed each day. Currently the data is completely reloaded each day, however going forward I would like to keep a record of changing records and new records. I'd also like to have a history table that I could query against using an "as of" date. I do not have data integration studio, but I suspect there is a clever way to track data updates/new inserts.

 

Example of what two days of data might look like:

data a;
   input name $ dept $ date date9.;
   format date date9.;
   datalines;
John Sales 01JAN2018
Mary Acctng 01JAN2018
;

data b;
   input name $ dept $ date date9.;
   format date date9.;
   datalines;
John Sales 02JAN2018
Mary Finance 02JAN2018
Jake Finance 02JAN2018
;

I'd like to be able to track these changes that occurred on dataset b. For example, I'd like to tell that on 1/2/2018 Mary switched to Finance, and Jake was a new record. I've tried a proc compare with outall option, but I'm not sure if it will allow me to track the changes and query the results.

 

I could simply stack dataset A and B on top of each other but in my case space is an issue and I would not like to save records that are not changing.

 

Any ideas?

3 REPLIES 3
data_null__
Jade | Level 19

Have you looked audit features enabled through PROC DATASETS?

mdavidson
Quartz | Level 8

I have not, but I will google it right now. Thanks!

LinusH
Tourmaline | Level 20

Do I understand correctly that your table that today is refreshed, you would change the update method to only modify/add  changed/new records? I just want to be sure, since change data capture normally describes how to fetch updats/inserts at the source, not the target. Manage target tables in more "intelligent" way is called historization, or SCD Type 2.

If the later, here SAS Data Integration Studio is nice tool if you are building a data warehouse or similar.

 

Data set audit is nice feature, but I would say that is more for audit (of data management itself), and usually works best with small/moderate data sizes.

 

If you want to track changes, and want to "easily" query it, SCD Type 2 sounds more what you need.

Suggested approach:

1. Inner join/lookup daily transactions with target table.

2. Compare columns values to see if there is any change.

3. If yes trigger an insert with a date/timestamp

4. Insert records that didn't get a match in the initial join

 

Some SCD Type 2 models also use end-dating of the record that is being updated (have anew veriosn inserted). This causes some more cost in the data loading aprt, but eases the query (you could use WHERE date BETWEEN StartDate AND EndDate), and typically a bit more performant than without it (for SAS data stores at least).

Data never sleeps

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 1981 views
  • 0 likes
  • 3 in conversation