Hi,
I am trying to build a process that takes a daily file and compares it to the data already stored in a dataset. I want to create a dataset out of the file and compare it to the main dataset to see if there were any observations that had data changed in any variable. For example, I have a dataset 3 observations in it. The variables are ID, Name, Date1, Date2, Amount.
1 Red 5/1/2017, 5/30/2017, $400.00
2 Blue 6/1/2017, 6/30/2017, $500.00
3 Blue 6/1/2017, 7/31/2017, $350.00
I get a file that has 1 observation with 2 data fields that are different from my original dataset.
2 Blue 6/1/2017, 7/31/2017, $600.00
I need to be able to do the following:
1. Update the old (primary) dataset with the new data in the updated file.
2. Add new records if they are in the new file but not the old (primary) dataset.
3. Generate a dataset or report that gives the observation, which variables were updated, and what the updates were.
I have found this article and it makes pretty good sense. However, I want to make sure there isn't something built in for this that would make it a little easier.
Using PROC COMPARE to identify new and updated ... - PharmaSUG
If there is a better way, please let me know.
I would suggest looking more at the UPDATE statement either in a data step or SQL proc.
Thanks, Reeza. I am using a combination of updates and merge statements to actually update the primary data set but what I am trying to do is create a dataset or report of the variables in observations that have changed each day. That is the primary focus of this article but it just seemed a little cumbersome and I was hoping there was an easier way.
If you're doing this manually I totally understand, and would also use update/merge. Lazy method:
1. Back up old data
2. Update data
3. Compare new to old via merge
Is there any possibility of including a modified/last updated field in your master dataset that you could then generate your report off?.
I know when I worked with SSIS we were able to automate this, because those fields were updated. I'm assuming SAS DI would have something similar but I really don't know.
Yes, I also did something similar with SSIS when I had it available to me. It has been hard to learn how to do some things in SAS that I was able to do fairly easily with SSIS and SSMS.
I do have a "last updated" field for each observation so I can key off of that. I am not sure what is different from your method versus the article if I am still going to compare using a merge. Could you explain that step in a little more detail? I already keep a copy of the prior day's dataset so I have the ability to compare the one I would update today to the data as it was yesterday.
My point is sadly that proc compare doesn't add any value to the process IMO. It's a proc that could have a ton of value but needs to be updated significantly.
I agree with you completely.
So, if you were doing it your way... in step three (update/merge) what would you do to compare those two datasets and either output to a new report or dataset the observations that have changes and which variables were changed?
@Peter_C, I had not heard of an audit trail before but I will definitely look into that. Right now I am updating my datasets each day with a combination of sql and MERGE but, if using an audit table is more efficient, then I will definitely look into it. Thank you for the suggestion!
I will report back.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
