BookmarkSubscribeRSS Feed
elwayfan446
Barite | Level 11

Hello,

 

I am trying to clean up some code for processing updates from a transaction dataset to a master dataset.  I ran across this SAS paper and subsequent SAS documentation that explains how to use the MODIFY statement to achieve this.

 

http://www2.sas.com/proceedings/sugi31/264-31.pdf

 

All of this works as instructed, but I need it to work a little bit differently and I have not been able to figure out how to do it.  In this paper, it talks about replacing the record in the master dataset if it also exists in the transaction dataset.  The issue I have is that I only want it to replace a record if the data for one of the records variables has changed.  In this case above, the entire record is rewritten because they match, regardless if any of the data has changed.

 

The reason I am trying to do this is because I want to be able to track and report on records that have had data change each day.  Is there a way to do this using the methods in this paper or similar?

 

Thanks!

28 REPLIES 28
Reeza
Super User

 The issue I have is that I only want it to replace a record if the data for one of the records variables has changed.  In this case above, the entire record is rewritten because they match, regardless if any of the data has changed.

 

When you say one of the record fields are you referring to a specific field or just if any are updated?

elwayfan446
Barite | Level 11

If any are updated.  If nothing in the entire record has changed between the transaction dataset and master, I don't want it to rewrite.

 

I want to be able to track and report changes but if all records are overwritten that match, I have no way to tell which of those actually had data change.

Reeza
Super User
How many variables do you have, and what types (num/char) or are you looking for a generic process?
elwayfan446
Barite | Level 11

I have appx 50 variables with a mix of char, num, and dates.

 

I would like it to be as generic as possible but not sure it can be if I need to be able to tell which specific data was updated.

Reeza
Super User

@elwayfan446 wrote:

I have appx 50 variables with a mix of char, num, and dates.

 

I would like it to be as generic as possible but not sure it can be if I need to be able to tell which specific data was updated.


I'm assuming you just want the record that was changed, do you need to know which field was changed?

 

PROC COMPARE is horrid, but it does give you all this information so it may be a mix of PROC COMPARE to get the changes and then a MODIFY or something that may work the best.

elwayfan446
Barite | Level 11

Ideally I would want only the records with differences to be replaced with the modify but then be able to see what fields were changed. 

 

I think at least being able to only get the different records to be replaced that is a good start.  I already have a date in each dataset that indicates the date the record was created.  If it isn't overwritten in the master data set, I can easily tell which records were updated on any given day because older records dates will not have changed.

Reeza
Super User
Don't have time to code this up right now, but will later. I would recommend first doing a SQL query between the two tables, using the EXCEPT operator to get only records that have changed. Then use MODIFY to update the data set and get the new dates. That should be relatively easy. Then if you want to know exactly what changed you can run a proc compare between the old and new tables, assuming you keep the old one around for the comparison.
SASKiwi
PROC Star

This example is a generic way of identifying new or changed rows comparing a master and transaction table using SQL:

 

data class1;
  set sashelp.class;
run;

data class2;
  set sashelp.class;
  if name = 'Mary' then age = 18;
run;

proc sql;
  create table Changed as
  select distinct name 

  from (
        (select * from class1 
         except corresponding 
         select * from class2 
        )
        union corresponding
        (select * from class2 
         except corresponding 
         select * from class1
        )
       )
  ;
quit;
johnsville
Obsidian | Level 7

the specific case can be implemented by concatenating all fields together from each record, then comparing the two concatenated strings.  update only when the two strings are not equal. The generic case would involve using dataset metadata in such a way as to make the code agnostic, but there would be cases to check for, like when the two datasets are not exactly the same.  It would be a lot more code, but could be implemented as a macro - pass that macro the two dataset names, have it identify the equal named variables, then have it create the catx functions for each.  If the data sets are required to have exactly the same variables, then have it 'abort cancel' when there is a discrepancy.

elwayfan446
Barite | Level 11

Thanks to everyone for responding.  I will give some these suggestions a try and report back.

Tom
Super User Tom
Super User

Why not just eliminate them from the transactions?

proc sql ;
create table actual_transactions as
  select * from transactions
  except
  select * from master
;
quit;
elwayfan446
Barite | Level 11

Wouldn't this also eliminate any new records in the transactions that need to be added to the master?

Tom
Super User Tom
Super User

@elwayfan446 wrote:

Wouldn't this also eliminate any new records in the transactions that need to be added to the master?


How could it possibly do that?  

elwayfan446
Barite | Level 11

The more I think about it, the more I understand it.  I have never used "except" before.

 

So if I am thinking correctly, this query will create a dataset that does the following:

 

1.  If the records are in the transactions dataset but not in the master, those records is added to the actual_transaction dataset

2.  If the records are in the transactions dataset but any of the variables have changed between that record and the record in the master dataset, then those records are added to the actual_transaction dataset

 

Am I thinking correctly?  If so, then I simply merge or modify that new transaction dataset with the master?

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 28 replies
  • 4185 views
  • 4 likes
  • 5 in conversation