Good Afternoon,
I have generated a report, the data comes from a vendor that will send a file whenever there is an update with only records that have been updated. I have a solution that works as far as I can tell, I want to update the existing file with only things that have changed. Attached is my solution. I know this is a common situation and I wanted to put this out here to get opinions, and hopefully find a better solution.
Thank You,
Mark
Your have a classic base data and transaction data set update;
data updata;
update data delta;
by empid;
run;
Hi,
Just to understand in a better way. Why you need this code.
proc sort data=compare;by empid descending period;
data final;
set compare;
by empid;
if first.empid then output;
run;
I added this to make sure I was only pulling the most recent record. For the real datasets I'm using there will be a datestamp. I just added period to sort by the most recent record.
In delta file there is only one observation (if updated) for each empid in addition to new empids right?
If the transaction file has multiple records for the same empid then ALL of them are applied.
This is actually a good behavior because not all of the variables may necessarily update at the same time.
It may be easier to see what happens with single records but not required. The master data set should have only one value or combination of the key values though.
Thanks for the input ballardw. Just trying to understand how update works. Please see below syntax where empid=3124 appears twice and only first observation in the sequence is updated with delta observation in transaction data while other remains the same.
data transaction;
input empid $ one two three four five period;
cards;
1234 1 2 3 4 5 1
2134 2 1 3 4 5 1
3124 3 2 1 4 5 1
3124 4 2 1 4 5 1
4123 4 3 2 1 5 1
5123 5 4 3 2 1 1
;
run;
data delta;
input empid $ one two three four five period;
cards;
3124 5 5 5 5 5 2
6666 1 1 1 1 1 1
;
run;
data updata;
update transaction delta;
by empid;
run;
proc print data=updata;
run;
The master data set should have only one value or combination of the key values though.
The transaction data can have multiples, but not the master table.
Thanks Reza - I've only one value in master data set (delta) for empid=3124 and two observations for empid=3124 in transaction table but it updates the first observation in transaction data only.
data updata;
update transaction delta;
by empid;
run;
proc print data=updata;
run;
The MASTER Table as SAS knows it is called transaction. It has multiple empid, 3124.
If you want delta to be the master table flip them around:
data updata;
update delta transaction;
by empid;
run;
Mark,
The most difficult part of this is understanding the problem. More specifically, answers to these questions would make the programming straightforward.
1. In your original file before any updates are applied, do you have just one observation per ID?
2. If not, would you like all of the observations changed for that ID or only the most recent one?
3. Is there a variable on your original data that determines which observation is the most recent for a given ID?
4. If the delta data set contains multiple observations for the same ID, how does that affect the desired result? (Should all the changes be applied to the most recent observation only for that ID?)
It is conceivable that ballardw's original solution is all that you need ... as long as the answer to question 1 is "Yes".
Are the "delta" records replacements for the original data? Or actual transactions (as the term is used within SAS by the UPDATE statement)? There is a major difference with respect to missing values. Step-by-Step Programming with Base SAS(R) Software
Do you have insertions in your "delta" files? That is can the delta add a brand new ID to the master file?
Do you have deletions in your "delta" files? That is can the delta specify that a particular ID be removed from the mast file?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.