10-19-2014 10:08 PM
I have a transactional database, approximately 120 million records and 100 variables across.
A record is entered and then goes through several changes before it's finalized, with each change entered as a distinct transaction.
If I look at only final records I have only 30million so on average a record goes through 4 changes.
I'm relatively new to this database and the main field doesn't change much but I'd like to find the best way to identify which fields are changing the most between the transactions until the finalization.
Any suggestions on how to efficiently analyze/solve this?
10-21-2014 02:22 PM
However, if you are looking for estimates, start with a random sample of transaction sets and use PROC COMPARE on the pairwise evolution. You can output the results and summarize over time. 5,000 sets is probably enough to get a handle on what is happening on average. If you are searching for the unique or outlier changes, you are stuck with working with the entire dataset.
10-21-2014 03:44 PM
You could use an audit trail when that transactional database is a SAS dataset http://support.sas.com/documentation/cdl/en/lrcon/67885/HTML/default/viewer.htm#n0ndg2uekz7qkbn1caok...
The same concept is used in OLTP DBMS systems alsof often names as journals or log files. They can be used to roll-back / roll-forward recovery processes.
When that is used those updates are commonly getting out of sync with extracted versions.