Solved: Re: Conditionally update master data set based on transaction data set

martyvd · Posted 01-03-2023 04:01 PM

I am using the basic code below to update a master data set using a transaction data set. I would like to update master data only when the update does not create a duplicate value of date and/or number within the by-group of ID or when the update adds a row (rather than replaces) in the table.

data master_updated;
update master transaction;
by ID original_number;
run;

Master

ID	Number	Original_number	Date
A	1	1	06/02/2021
A	2	2	04/23/2021
B	1	1	03/08/2021
B	2	2	04/05/2021
B	3	3	10/23/2021
B	4	4	04/02/2022
C	2	2	06/03/2021

Transaction

ID	Number	Original_number	Date
A	2	1	04/23/2021
B	4	4	04/07/2022
C	2	1	06/03/2021

Master_updated

ID	Number	Original_number	Date
A	2	1	04/23/2021
A	2	2	04/23/2021
B	1	1	03/08/2021
B	2	2	04/05/2021
B	3	3	10/23/2021
B	4	4	04/07/2022
C	2	1	06/03/2021
C	2	2	06/03/2021

The issues mentioned above are highlighted in red - the updated data should not have duplicate values of number or date. A and C have "invalid" updates while B is valid. Is there a way to add conditions so that some updates do not occur?

Patrick · Posted 01-03-2023 10:31 PM

Based on your sample data and desired result below code should do the job.

Based on your description there could be additional use cases not yet covered. If so can you please extend your sample data with these cases and show us the desired outcome?

data master;
 infile datalines truncover;
 input ID $ Number Original_number Date:mmddyy10.;
 format date mmddyy10.;
 datalines;
A 1 1 06/02/2021
A 2 2 04/23/2021
B 1 1 03/08/2021
B 2 2 04/05/2021
B 3 3 10/23/2021
B 4 4 04/02/2022
C 2 2 06/03/2021
;

data transaction;
 infile datalines truncover;
 input ID $ Number Original_number Date:mmddyy10.;
 format date date9.;
 datalines;
A 2 1 04/23/2021
B 4 4 04/07/2022
C 2 1 06/03/2021
;

data master;

  if _n_=1 then
    do;
      dcl hash h1(dataset:'master');
      h1.defineKey('id','number','date');
      h1.defineDone();
    end;
  modify master transaction;
  by id original_number;

  if _iorc_=0 then
    do;
      if h1.check() ne 0 then replace;
    end;
run;

proc print data=master;
run;

View solution in original post

ballardw · Posted 01-03-2023 04:13 PM

Better than showing incorrect actual output alone is to show expected output.

Note that if you change this record

A

2

04/23/2021

to

A

2

04/28/2021

you will see that the UPDATE is NOT playing with the dates in that record. The code has done exactly what you told it to.

So you need to tell us what you actually expect for a result. A complicated description likely means that complicated code is needed. Not always, but often. You are expecting the result of comparing one record to another to also consider values on other records. Data step Update, Modify or Merge are not going to do that with much additional coding.

Perhaps all you need is a Proc Sort with the Nodupkey option on the result of this update.

@martyvd wrote:

I am using the basic code below to update a master data set using a transaction data set. I would like to update master data only when the update does not create a duplicate value of date and/or number within the by-group of ID or when the update adds a row (rather than replaces) in the table.
data master_updated;
update master transaction;
by ID original_number;
run;
Master

ID Number Original_number Date

A 1 1 06/02/2021

A 2 2 04/23/2021

B 1 1 03/08/2021

B 2 2 04/05/2021

B 3 3 10/23/2021

B 4 4 04/02/2022

C 2 2 06/03/2021

Transaction

ID Number Original_number Date

A 2 1 04/23/2021

B 4 4 04/07/2022

C 2 1 06/03/2021

Master_updated

ID Number Original_number Date

A 2 1 04/23/2021

A 2 2 04/23/2021

B 1 1 03/08/2021

B 2 2 04/05/2021

B 3 3 10/23/2021

B 4 4 04/07/2022

C 2 1 06/03/2021

C 2 2 06/03/2021

The issues mentioned above are highlighted in red - the updated data should not have duplicate values of number or date. A and C have "invalid" updates while B is valid. Is there a way to add conditions so that some updates do not occur?

martyvd · Posted 01-03-2023 04:22 PM

Maybe my phrasing was confusing. The output is not "incorrect" - the code is indeed doing what I expect it to do. I am asking if it is possible to write a more complex data step with conditions that would produce the following:

ID	Number	Original_number	Date
A	1	1	06/02/2021
A	2	2	04/23/2021
B	1	1	03/08/2021
B	2	2	04/05/2021
B	3	3	10/23/2021
B	4	4	04/07/2022
C	2	2	06/03/2021

Ideally the only update that would occur is for ID B.

ballardw · Posted 01-03-2023 05:37 PM

@martyvd wrote:

Maybe my phrasing was confusing. The output is not "incorrect" - the code is indeed doing what I expect it to do. I am asking if it is possible to write a more complex data step with conditions that would produce the following:

ID Number Original_number Date

A 1 1 06/02/2021

A 2 2 04/23/2021

B 1 1 03/08/2021

B 2 2 04/05/2021

B 3 3 10/23/2021

B 4 4 04/07/2022

C 2 2 06/03/2021

Ideally the only update that would occur is for ID B.

Realizing that your example may be much simpler than your actual problem but if you only want to update B (or a selection ) then a WHERE dataset option to only have the values from the transaction data set where ID is B (or some other values) might be in order.

The transaction set will add records if there is a combination of BY variables not present in the master.

You could avoid that by using a MERGE, which has a different behavior for multiple values of the BY variables in the Transaction data set, and using dataset options to set IN variables to see if both data sets contribute something to the current record and if only the transaction set is supplying the current record then delete or not keep the record with a subsetting IF or conditional Delete statement.

Note, I might write some code to demonstrate but don't want create example data sets. Hint.

Tom · Posted 01-03-2023 10:46 PM

You do NOT appear to be doing an UPDATE operation. This for when you want to only use the non-missing values of the transaction dataset.

You appear to just be doing a MERGE operation.

If you want to only use the new data for ID='B' then only use those observations.

So if you start with this data.

data original ;
  input ID $ Number Original_number Date :mmddyy.;
  format date yymmdd10.;
cards;
A 1 1 06/02/2021
A 2 2 04/23/2021
B 1 1 03/08/2021
B 2 2 04/05/2021
B 3 3 10/23/2021
B 4 4 04/02/2022
C 2 2 06/03/2021
;

data trans ;
  input ID $ Number Original_number Date :mmddyy.;
  format date yymmdd10.;
cards;
A 2 1 04/23/2021
B 4 4 04/07/2022
C 2 1 06/03/2021
;

And want to get this data:

data expected ;
  input ID $ Number Original_number Date :mmddyy.;
  format date yymmdd10.;
cards;
A 1 1 06/02/2021
A 2 2 04/23/2021
B 1 1 03/08/2021
B 2 2 04/05/2021
B 3 3 10/23/2021
B 4 4 04/07/2022
C 2 2 06/03/2021
;

Then you can just use this data step to only take the one observation from the TRANS dataset.

data want;
  merge original trans(where=(id='B'));
  by id number ;
run;

Let's test if it gets what you wanted.

proc compare data=want compare=expected;
run;

The COMPARE Procedure                                                                                                               
Comparison of WORK.WANT with WORK.EXPECTED                                                                                          
(Method=EXACT)                                                                                                                      
                                                                                                                                    
Data Set Summary                                                                                                                    
                                                                                                                                    
Dataset                 Created          Modified  NVar    NObs                                                                     
                                                                                                                                    
WORK.WANT      03JAN23:22:42:07  03JAN23:22:42:07     4       7                                                                     
WORK.EXPECTED  03JAN23:22:42:07  03JAN23:22:42:07     4       7                                                                     
                                                                                                                                    
                                                                                                                                    
Variables Summary                                                                                                                   
                                                                                                                                    
Number of Variables in Common: 4.                                                                                                   

 



                                                                                                                                    
                                                                                                                                    
Observation Summary                                                                                                                 
                                                                                                                                    
Observation      Base  Compare                                                                                                      
                                                                                                                                    
First Obs           1        1                                                                                                      
Last  Obs           7        7                                                                                                      
                                                                                                                                    
Number of Observations in Common: 7.                                                                                                
Total Number of Observations Read from WORK.WANT: 7.                                                                                
Total Number of Observations Read from WORK.EXPECTED: 7.                                                                            
                                                                                                                                    
Number of Observations with Some Compared Variables Unequal: 0.                                                                     
Number of Observations with All Compared Variables Equal: 7.                                                                        
                                                                                                                                    
NOTE: No unequal values were found. All values compared are exactly equal.

Patrick · Posted 01-03-2023 10:31 PM

Based on your sample data and desired result below code should do the job.

Based on your description there could be additional use cases not yet covered. If so can you please extend your sample data with these cases and show us the desired outcome?

data master;
 infile datalines truncover;
 input ID $ Number Original_number Date:mmddyy10.;
 format date mmddyy10.;
 datalines;
A 1 1 06/02/2021
A 2 2 04/23/2021
B 1 1 03/08/2021
B 2 2 04/05/2021
B 3 3 10/23/2021
B 4 4 04/02/2022
C 2 2 06/03/2021
;

data transaction;
 infile datalines truncover;
 input ID $ Number Original_number Date:mmddyy10.;
 format date date9.;
 datalines;
A 2 1 04/23/2021
B 4 4 04/07/2022
C 2 1 06/03/2021
;

data master;

  if _n_=1 then
    do;
      dcl hash h1(dataset:'master');
      h1.defineKey('id','number','date');
      h1.defineDone();
    end;
  modify master transaction;
  by id original_number;

  if _iorc_=0 then
    do;
      if h1.check() ne 0 then replace;
    end;
run;

proc print data=master;
run;

martyvd · Posted 01-04-2023 04:05 PM

Thank you, this appears to be working. Now I just need to read some documentation to understand *how* it works.

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away