BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
martyvd
Fluorite | Level 6

I am using the basic code below to update a master data set using a transaction data set. I would like to update master data only when the update does not create a duplicate value of date and/or number within the by-group of ID or when the update adds a row (rather than replaces) in the table. 

 

data master_updated;
update master transaction;
by ID original_number;
run;

Master 

IDNumberOriginal_numberDate
A1106/02/2021
A2204/23/2021
B1103/08/2021
B2204/05/2021
B3310/23/2021
B4404/02/2022
C2206/03/2021

 

Transaction

IDNumberOriginal_numberDate
A2104/23/2021
B4404/07/2022
C2106/03/2021

 

Master_updated

IDNumberOriginal_numberDate
A2104/23/2021
A2204/23/2021
B1103/08/2021
B2204/05/2021
B3310/23/2021
B4404/07/2022
C2106/03/2021
C2206/03/2021

 

The issues mentioned above are highlighted in red - the updated data should not have duplicate values of number or date. A and C have "invalid" updates while B is valid. Is there a way to add conditions so that some updates do not occur?

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

Based on your sample data and desired result below code should do the job.

Based on your description there could be additional use cases not yet covered. If so can you please extend your sample data with these cases and show us the desired outcome?

data master;
 infile datalines truncover;
 input ID $ Number Original_number Date:mmddyy10.;
 format date mmddyy10.;
 datalines;
A 1 1 06/02/2021
A 2 2 04/23/2021
B 1 1 03/08/2021
B 2 2 04/05/2021
B 3 3 10/23/2021
B 4 4 04/02/2022
C 2 2 06/03/2021
;

data transaction;
 infile datalines truncover;
 input ID $ Number Original_number Date:mmddyy10.;
 format date date9.;
 datalines;
A 2 1 04/23/2021
B 4 4 04/07/2022
C 2 1 06/03/2021
;

data master;

  if _n_=1 then
    do;
      dcl hash h1(dataset:'master');
      h1.defineKey('id','number','date');
      h1.defineDone();
    end;
  modify master transaction;
  by id original_number;

  if _iorc_=0 then
    do;
      if h1.check() ne 0 then replace;
    end;
run;

proc print data=master;
run;

Patrick_0-1672803176545.png

View solution in original post

6 REPLIES 6
ballardw
Super User

Better than showing incorrect actual output alone is to show expected output.

 

Note that if you change this record

A 2 2

04/23/2021

to

A 2 2 04/28/2021

you will see that the UPDATE is NOT playing with the dates in that record. The code has done exactly what you told it to.

So you need to tell us what you actually expect for a result. A complicated description likely means that complicated code is needed. Not always, but often. You are expecting the result of comparing one record to another to also consider values on other records. Data step Update, Modify or Merge are not going to do that with much additional coding.

 

Perhaps all you need is a Proc Sort with the Nodupkey option on the result of this update.

 


@martyvd wrote:

I am using the basic code below to update a master data set using a transaction data set. I would like to update master data only when the update does not create a duplicate value of date and/or number within the by-group of ID or when the update adds a row (rather than replaces) in the table. 

 

data master_updated;
update master transaction;
by ID original_number;
run;

Master 

ID Number Original_number Date
A 1 1 06/02/2021
A 2 2 04/23/2021
B 1 1 03/08/2021
B 2 2 04/05/2021
B 3 3 10/23/2021
B 4 4 04/02/2022
C 2 2 06/03/2021

 

Transaction

ID Number Original_number Date
A 2 1 04/23/2021
B 4 4 04/07/2022
C 2 1 06/03/2021

 

Master_updated

ID Number Original_number Date
A 2 1 04/23/2021
A 2 2 04/23/2021
B 1 1 03/08/2021
B 2 2 04/05/2021
B 3 3 10/23/2021
B 4 4 04/07/2022
C 2 1 06/03/2021
C 2 2 06/03/2021

 

The issues mentioned above are highlighted in red - the updated data should not have duplicate values of number or date. A and C have "invalid" updates while B is valid. Is there a way to add conditions so that some updates do not occur?


 

 

martyvd
Fluorite | Level 6

Maybe my phrasing was confusing. The output is not "incorrect" - the code is indeed doing what I expect it to do. I am asking if it is possible to write a more complex data step with conditions that would produce the following:

 

IDNumberOriginal_numberDate
A1106/02/2021
A2204/23/2021
B1103/08/2021
B2204/05/2021
B3310/23/2021
B4404/07/2022
C2206/03/2021

 

Ideally the only update that would occur is for ID B. 

ballardw
Super User

@martyvd wrote:

Maybe my phrasing was confusing. The output is not "incorrect" - the code is indeed doing what I expect it to do. I am asking if it is possible to write a more complex data step with conditions that would produce the following:

 

ID Number Original_number Date
A 1 1 06/02/2021
A 2 2 04/23/2021
B 1 1 03/08/2021
B 2 2 04/05/2021
B 3 3 10/23/2021
B 4 4 04/07/2022
C 2 2 06/03/2021

 

Ideally the only update that would occur is for ID B. 


Realizing that your example may be much simpler than your actual problem but if you only want to update B (or a selection ) then a WHERE dataset option to only have the values from the transaction data set where ID is B (or some other values) might be in order.

 

The transaction set will add records if there is a combination of BY variables not present in the master.

You could avoid that by using a MERGE, which has a different behavior for multiple values of the BY variables in the Transaction data set, and using dataset options to set IN variables to see if both data sets contribute something to the current record and if only the transaction set is supplying the current record then delete or not keep the record with a subsetting IF or conditional Delete statement.

Note, I might write some code to demonstrate but don't want create example data sets. Hint.

Tom
Super User Tom
Super User

You do NOT appear to be doing an UPDATE operation.  This for when you want to only use the non-missing values of the transaction dataset.

You appear to just be doing a MERGE operation.

If you want to only use the new data for ID='B' then only use those observations.

 

So if you start with this data.

data original ;
  input ID $ Number Original_number Date :mmddyy.;
  format date yymmdd10.;
cards;
A 1 1 06/02/2021
A 2 2 04/23/2021
B 1 1 03/08/2021
B 2 2 04/05/2021
B 3 3 10/23/2021
B 4 4 04/02/2022
C 2 2 06/03/2021
;

data trans ;
  input ID $ Number Original_number Date :mmddyy.;
  format date yymmdd10.;
cards;
A 2 1 04/23/2021
B 4 4 04/07/2022
C 2 1 06/03/2021
;

And want to get this data:

data expected ;
  input ID $ Number Original_number Date :mmddyy.;
  format date yymmdd10.;
cards;
A 1 1 06/02/2021
A 2 2 04/23/2021
B 1 1 03/08/2021
B 2 2 04/05/2021
B 3 3 10/23/2021
B 4 4 04/07/2022
C 2 2 06/03/2021
;

Then you can just use this data step to only take the one observation from the TRANS dataset.

data want;
  merge original trans(where=(id='B'));
  by id number ;
run;

Let's test if it gets what you wanted.

proc compare data=want compare=expected;
run;
The COMPARE Procedure                                                                                                               
Comparison of WORK.WANT with WORK.EXPECTED                                                                                          
(Method=EXACT)                                                                                                                      
                                                                                                                                    
Data Set Summary                                                                                                                    
                                                                                                                                    
Dataset                 Created          Modified  NVar    NObs                                                                     
                                                                                                                                    
WORK.WANT      03JAN23:22:42:07  03JAN23:22:42:07     4       7                                                                     
WORK.EXPECTED  03JAN23:22:42:07  03JAN23:22:42:07     4       7                                                                     
                                                                                                                                    
                                                                                                                                    
Variables Summary                                                                                                                   
                                                                                                                                    
Number of Variables in Common: 4.                                                                                                   

 



                                                                                                                                    
                                                                                                                                    
Observation Summary                                                                                                                 
                                                                                                                                    
Observation      Base  Compare                                                                                                      
                                                                                                                                    
First Obs           1        1                                                                                                      
Last  Obs           7        7                                                                                                      
                                                                                                                                    
Number of Observations in Common: 7.                                                                                                
Total Number of Observations Read from WORK.WANT: 7.                                                                                
Total Number of Observations Read from WORK.EXPECTED: 7.                                                                            
                                                                                                                                    
Number of Observations with Some Compared Variables Unequal: 0.                                                                     
Number of Observations with All Compared Variables Equal: 7.                                                                        
                                                                                                                                    
NOTE: No unequal values were found. All values compared are exactly equal.         
 
Patrick
Opal | Level 21

Based on your sample data and desired result below code should do the job.

Based on your description there could be additional use cases not yet covered. If so can you please extend your sample data with these cases and show us the desired outcome?

data master;
 infile datalines truncover;
 input ID $ Number Original_number Date:mmddyy10.;
 format date mmddyy10.;
 datalines;
A 1 1 06/02/2021
A 2 2 04/23/2021
B 1 1 03/08/2021
B 2 2 04/05/2021
B 3 3 10/23/2021
B 4 4 04/02/2022
C 2 2 06/03/2021
;

data transaction;
 infile datalines truncover;
 input ID $ Number Original_number Date:mmddyy10.;
 format date date9.;
 datalines;
A 2 1 04/23/2021
B 4 4 04/07/2022
C 2 1 06/03/2021
;

data master;

  if _n_=1 then
    do;
      dcl hash h1(dataset:'master');
      h1.defineKey('id','number','date');
      h1.defineDone();
    end;
  modify master transaction;
  by id original_number;

  if _iorc_=0 then
    do;
      if h1.check() ne 0 then replace;
    end;
run;

proc print data=master;
run;

Patrick_0-1672803176545.png

martyvd
Fluorite | Level 6
Thank you, this appears to be working. Now I just need to read some documentation to understand *how* it works.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1020 views
  • 0 likes
  • 4 in conversation