DATA Step, Macro, Functions and more

Delta File

Reply
Valued Guide
Posts: 858

Delta File

Good Afternoon,

I have generated a report, the data comes from a vendor that will send a file whenever there is an update with only records that have been updated.  I have a solution that works as far as I can tell, I want to update the existing file with only things that have changed.  Attached is my solution.  I know this is a common situation and I wanted to put this out here to get opinions, and hopefully find a better solution.

Thank You,

Mark

Attachment
Super User
Posts: 10,500

Re: Delta File

Your have a classic base data and transaction data set update;

data updata;

update data delta;

by empid;

run;

Trusted Advisor
Posts: 1,204

Re: Delta File

Hi,

Just to understand in a better way. Why you need this code.

proc sort data=compare;by empid descending period;

data final;

set compare;

by empid;

if first.empid then output;

run;

Valued Guide
Posts: 858

Re: Delta File

I added this to make sure I was only pulling the most recent record.  For the real datasets I'm using there will be a datestamp.  I just added period to sort by the most recent record.

Trusted Advisor
Posts: 1,204

Re: Delta File

In delta file there is only one observation (if updated) for each empid in addition to new empids right?

Super User
Posts: 10,500

Re: Delta File

If the transaction file has multiple records for the same empid then ALL of them are applied.

This is actually a good behavior because not all of the variables may necessarily update at the same time.

It may be easier to see what happens with single records but not required. The master data set should have only one value or combination of the key values though.

Trusted Advisor
Posts: 1,204

Re: Delta File

Thanks for the input . Just trying to understand how update works. Please see below syntax where empid=3124 appears twice and only first observation in the sequence is updated with delta observation in transaction data while other remains the same.

data transaction;

input empid $ one two three four five period;

cards;

1234 1 2 3 4 5 1

2134 2 1 3 4 5 1

3124 3 2 1 4 5 1

3124 4 2 1 4 5 1

4123 4 3 2 1 5 1

5123 5 4 3 2 1 1

;

run;

data delta;

input empid $ one two three four five period;

cards;

3124 5 5 5 5 5 2

6666 1 1 1 1 1 1

;

run;

data updata;

update transaction delta;

by empid;

run;

proc print data=updata;

run;

Super User
Posts: 17,824

Re: Delta File

The master data set should have only one value or combination of the key values though.

The transaction data can have multiples, but not the master table.

Trusted Advisor
Posts: 1,204

Re: Delta File

Thanks Reza - I've only one value in master data set (delta) for empid=3124 and two observations for empid=3124 in transaction table but it updates the first observation in transaction data only.

Super User
Posts: 17,824

Re: Delta File

data updata;

update transaction delta;

by empid;

run;

proc print data=updata;

run;

The MASTER Table as SAS knows it is called transaction. It has multiple empid, 3124.

If you want delta to be the master table flip them around:

data updata;

update delta transaction;

by empid;

run;

Super User
Posts: 5,082

Re: Delta File

Mark,

The most difficult part of this is understanding the problem.  More specifically, answers to these questions would make the programming straightforward.

1. In your original file before any updates are applied, do you have just one observation per ID?

2. If not, would you like all of the observations changed for that ID or only the most recent one?

3. Is there a variable on your original data that determines which observation is the most recent for a given ID?

4. If the delta data set contains multiple observations for the same ID, how does that affect the desired result?  (Should all the changes be applied to the most recent observation only for that ID?)

It is conceivable that ballardw's original solution is all that you need ... as long as the answer to question 1 is "Yes".

Super User
Super User
Posts: 6,500

Re: Delta File

Are the "delta" records replacements for the original data?  Or actual transactions (as the term is used within SAS by the UPDATE statement)?  There is a major difference with respect to missing values.  Step-by-Step Programming with Base SAS(R) Software

Do you have insertions in your "delta" files? That is can the delta add a brand new ID to the master file?

Do you have deletions in your "delta" files?  That is can the delta specify that a particular ID be removed from the mast file?

Ask a Question
Discussion stats
  • 11 replies
  • 564 views
  • 1 like
  • 6 in conversation