BookmarkSubscribeRSS Feed
Reeza
Super User

I have a transactional database, approximately 120 million records and 100 variables across.

A record is entered and then goes through several changes before it's finalized, with each change entered as a distinct transaction. 

If I look at only final records I have only 30million so on average a record goes through 4 changes.

I'm relatively new to this database and the main field doesn't change much but I'd like to find the best way to identify which fields are changing the most between the transactions until the finalization.

Any suggestions on how to efficiently analyze/solve this?

2 REPLIES 2
Doc_Duke
Rhodochrosite | Level 12

Efficiently, no.

However, if you are looking for estimates, start with a random sample of transaction sets and use PROC COMPARE on the pairwise evolution.  You can output the results and summarize over time.  5,000 sets is probably enough to get a handle on what is happening on average.  If you are searching for the unique or outlier changes, you are stuck with working with the entire dataset.

jakarman
Barite | Level 11

You could use an audit trail when that transactional database is a SAS dataset http://support.sas.com/documentation/cdl/en/lrcon/67885/HTML/default/viewer.htm#n0ndg2uekz7qkbn1caok...

The same concept is used in OLTP DBMS systems alsof often names as journals or log files. They can be used to roll-back / roll-forward recovery processes.

When that is used those updates are commonly getting out of sync with extracted versions.  

---->-- ja karman --<-----

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 1530 views
  • 0 likes
  • 3 in conversation