Compare 2 datasets and output observations with variables that have changed data

Reply
Frequent Contributor
Posts: 140

Compare 2 datasets and output observations with variables that have changed data

Hi,

 

I am trying to build a process that takes a daily file and compares it to the data already stored in a dataset.  I want to create a dataset out of the file and compare it to the main dataset to see if there were any observations that had data changed in any variable.  For example, I have a dataset 3 observations in it.  The variables are ID, Name, Date1, Date2, Amount. 

 

1 Red 5/1/2017, 5/30/2017, $400.00

2 Blue 6/1/2017, 6/30/2017, $500.00

3 Blue 6/1/2017, 7/31/2017, $350.00

 

I get a file that has 1 observation with 2 data fields that are different from my original dataset.

 

2 Blue 6/1/2017, 7/31/2017, $600.00

 

I need to be able to do the following:

 

1. Update the old (primary) dataset with the new data in the updated file.

2. Add new records if they are in the new file but not the old (primary) dataset.

3. Generate a dataset or report that gives the observation, which variables were updated, and what the updates were.

 

I have found this article and it makes pretty good sense.  However, I want to make sure there isn't something built in for this that would make it a little easier.

 

Using PROC COMPARE to identify new and updated ... - PharmaSUG

 

If there is a better way, please let me know.

Super User
Posts: 17,840

Re: Compare 2 datasets and output observations with variables that have changed data

I would suggest looking more at the UPDATE statement either in a data step or SQL proc.

 

 

Frequent Contributor
Posts: 140

Re: Compare 2 datasets and output observations with variables that have changed data

Thanks, Reeza.  I am using a combination of updates and merge statements to actually update the primary data set but what I am trying to do is create a dataset or report of the variables in observations that have changed each day.  That is the primary focus of this article but it just seemed a little cumbersome and I was hoping there was an easier way.

Super User
Posts: 17,840

Re: Compare 2 datasets and output observations with variables that have changed data

If you're doing this manually I totally understand, and would also use update/merge. Lazy method:

1. Back up old data

2. Update data

3. Compare new to old via merge

Is there any possibility of including a modified/last updated field in your master dataset that you could then generate your report off?. 

 

I know when I worked with SSIS we were able to automate this, because those fields were updated. I'm assuming SAS DI would have something similar but I really don't know.  

Frequent Contributor
Posts: 140

Re: Compare 2 datasets and output observations with variables that have changed data

Yes, I also did something similar with SSIS when I had it available to me.  It has been hard to learn how to do some things in SAS that I was able to do fairly easily with SSIS and SSMS. 

 

I do have a "last updated" field for each observation so I can key off of that.  I am not sure what is different from your method versus the article if I am still going to compare using a merge.  Could you explain that step in a little more detail?  I already keep a copy of the prior day's dataset so I have the ability to compare the one I would update today to the data as it was yesterday.

Super User
Posts: 17,840

Re: Compare 2 datasets and output observations with variables that have changed data

My point is sadly that proc compare doesn't add any value to the process IMO. It's a proc that could have a ton of value but needs to be updated significantly. 

Frequent Contributor
Posts: 140

Re: Compare 2 datasets and output observations with variables that have changed data

I agree with you completely.

 

So, if you were doing it your way... in step three (update/merge) what would you do to compare those two datasets and either output to a new report or dataset the observations that have changes and which variables were changed?

Valued Guide
Posts: 2,175

Re: Compare 2 datasets and output observations with variables that have changed data

Hi @elwayfan466
Have you looked into using an AUDIT table.
This makes most sense if your masterfile is updated like a database - by sql update or data step MODIFY - rather than by data step SET and MERGE
Frequent Contributor
Posts: 140

Re: Compare 2 datasets and output observations with variables that have changed data

@Peter_C, I had not heard of an audit trail before but I will definitely look into that.  Right now I am updating my datasets each day with a combination of sql and MERGE but, if using an audit table is more efficient, then I will definitely look into it.  Thank you for the suggestion!

 

I will report back.

Ask a Question
Discussion stats
  • 8 replies
  • 267 views
  • 1 like
  • 3 in conversation