I am using the basic code below to update a master data set using a transaction data set. I would like to update master data only when the update does not create a duplicate value of date and/or number within the by-group of ID or when the update adds a row (rather than replaces) in the table.
data master_updated;
update master transaction;
by ID original_number;
run;
Master
ID | Number | Original_number | Date |
A | 1 | 1 | 06/02/2021 |
A | 2 | 2 | 04/23/2021 |
B | 1 | 1 | 03/08/2021 |
B | 2 | 2 | 04/05/2021 |
B | 3 | 3 | 10/23/2021 |
B | 4 | 4 | 04/02/2022 |
C | 2 | 2 | 06/03/2021 |
Transaction
ID | Number | Original_number | Date |
A | 2 | 1 | 04/23/2021 |
B | 4 | 4 | 04/07/2022 |
C | 2 | 1 | 06/03/2021 |
Master_updated
ID | Number | Original_number | Date |
A | 2 | 1 | 04/23/2021 |
A | 2 | 2 | 04/23/2021 |
B | 1 | 1 | 03/08/2021 |
B | 2 | 2 | 04/05/2021 |
B | 3 | 3 | 10/23/2021 |
B | 4 | 4 | 04/07/2022 |
C | 2 | 1 | 06/03/2021 |
C | 2 | 2 | 06/03/2021 |
The issues mentioned above are highlighted in red - the updated data should not have duplicate values of number or date. A and C have "invalid" updates while B is valid. Is there a way to add conditions so that some updates do not occur?
Based on your sample data and desired result below code should do the job.
Based on your description there could be additional use cases not yet covered. If so can you please extend your sample data with these cases and show us the desired outcome?
data master;
infile datalines truncover;
input ID $ Number Original_number Date:mmddyy10.;
format date mmddyy10.;
datalines;
A 1 1 06/02/2021
A 2 2 04/23/2021
B 1 1 03/08/2021
B 2 2 04/05/2021
B 3 3 10/23/2021
B 4 4 04/02/2022
C 2 2 06/03/2021
;
data transaction;
infile datalines truncover;
input ID $ Number Original_number Date:mmddyy10.;
format date date9.;
datalines;
A 2 1 04/23/2021
B 4 4 04/07/2022
C 2 1 06/03/2021
;
data master;
if _n_=1 then
do;
dcl hash h1(dataset:'master');
h1.defineKey('id','number','date');
h1.defineDone();
end;
modify master transaction;
by id original_number;
if _iorc_=0 then
do;
if h1.check() ne 0 then replace;
end;
run;
proc print data=master;
run;
Better than showing incorrect actual output alone is to show expected output.
Note that if you change this record
A | 2 | 2 |
04/23/2021 |
to
A | 2 | 2 | 04/28/2021 |
you will see that the UPDATE is NOT playing with the dates in that record. The code has done exactly what you told it to.
So you need to tell us what you actually expect for a result. A complicated description likely means that complicated code is needed. Not always, but often. You are expecting the result of comparing one record to another to also consider values on other records. Data step Update, Modify or Merge are not going to do that with much additional coding.
Perhaps all you need is a Proc Sort with the Nodupkey option on the result of this update.
@martyvd wrote:
I am using the basic code below to update a master data set using a transaction data set. I would like to update master data only when the update does not create a duplicate value of date and/or number within the by-group of ID or when the update adds a row (rather than replaces) in the table.
data master_updated; update master transaction; by ID original_number; run;
Master
ID Number Original_number Date A 1 1 06/02/2021 A 2 2 04/23/2021 B 1 1 03/08/2021 B 2 2 04/05/2021 B 3 3 10/23/2021 B 4 4 04/02/2022 C 2 2 06/03/2021
Transaction
ID Number Original_number Date A 2 1 04/23/2021 B 4 4 04/07/2022 C 2 1 06/03/2021
Master_updated
ID Number Original_number Date A 2 1 04/23/2021 A 2 2 04/23/2021 B 1 1 03/08/2021 B 2 2 04/05/2021 B 3 3 10/23/2021 B 4 4 04/07/2022 C 2 1 06/03/2021 C 2 2 06/03/2021
The issues mentioned above are highlighted in red - the updated data should not have duplicate values of number or date. A and C have "invalid" updates while B is valid. Is there a way to add conditions so that some updates do not occur?
Maybe my phrasing was confusing. The output is not "incorrect" - the code is indeed doing what I expect it to do. I am asking if it is possible to write a more complex data step with conditions that would produce the following:
ID | Number | Original_number | Date |
A | 1 | 1 | 06/02/2021 |
A | 2 | 2 | 04/23/2021 |
B | 1 | 1 | 03/08/2021 |
B | 2 | 2 | 04/05/2021 |
B | 3 | 3 | 10/23/2021 |
B | 4 | 4 | 04/07/2022 |
C | 2 | 2 | 06/03/2021 |
Ideally the only update that would occur is for ID B.
@martyvd wrote:
Maybe my phrasing was confusing. The output is not "incorrect" - the code is indeed doing what I expect it to do. I am asking if it is possible to write a more complex data step with conditions that would produce the following:
ID Number Original_number Date A 1 1 06/02/2021 A 2 2 04/23/2021 B 1 1 03/08/2021 B 2 2 04/05/2021 B 3 3 10/23/2021 B 4 4 04/07/2022 C 2 2 06/03/2021
Ideally the only update that would occur is for ID B.
Realizing that your example may be much simpler than your actual problem but if you only want to update B (or a selection ) then a WHERE dataset option to only have the values from the transaction data set where ID is B (or some other values) might be in order.
The transaction set will add records if there is a combination of BY variables not present in the master.
You could avoid that by using a MERGE, which has a different behavior for multiple values of the BY variables in the Transaction data set, and using dataset options to set IN variables to see if both data sets contribute something to the current record and if only the transaction set is supplying the current record then delete or not keep the record with a subsetting IF or conditional Delete statement.
Note, I might write some code to demonstrate but don't want create example data sets. Hint.
You do NOT appear to be doing an UPDATE operation. This for when you want to only use the non-missing values of the transaction dataset.
You appear to just be doing a MERGE operation.
If you want to only use the new data for ID='B' then only use those observations.
So if you start with this data.
data original ;
input ID $ Number Original_number Date :mmddyy.;
format date yymmdd10.;
cards;
A 1 1 06/02/2021
A 2 2 04/23/2021
B 1 1 03/08/2021
B 2 2 04/05/2021
B 3 3 10/23/2021
B 4 4 04/02/2022
C 2 2 06/03/2021
;
data trans ;
input ID $ Number Original_number Date :mmddyy.;
format date yymmdd10.;
cards;
A 2 1 04/23/2021
B 4 4 04/07/2022
C 2 1 06/03/2021
;
And want to get this data:
data expected ;
input ID $ Number Original_number Date :mmddyy.;
format date yymmdd10.;
cards;
A 1 1 06/02/2021
A 2 2 04/23/2021
B 1 1 03/08/2021
B 2 2 04/05/2021
B 3 3 10/23/2021
B 4 4 04/07/2022
C 2 2 06/03/2021
;
Then you can just use this data step to only take the one observation from the TRANS dataset.
data want;
merge original trans(where=(id='B'));
by id number ;
run;
Let's test if it gets what you wanted.
proc compare data=want compare=expected;
run;
The COMPARE Procedure Comparison of WORK.WANT with WORK.EXPECTED (Method=EXACT) Data Set Summary Dataset Created Modified NVar NObs WORK.WANT 03JAN23:22:42:07 03JAN23:22:42:07 4 7 WORK.EXPECTED 03JAN23:22:42:07 03JAN23:22:42:07 4 7 Variables Summary Number of Variables in Common: 4. Observation Summary Observation Base Compare First Obs 1 1 Last Obs 7 7 Number of Observations in Common: 7. Total Number of Observations Read from WORK.WANT: 7. Total Number of Observations Read from WORK.EXPECTED: 7. Number of Observations with Some Compared Variables Unequal: 0. Number of Observations with All Compared Variables Equal: 7. NOTE: No unequal values were found. All values compared are exactly equal.
Based on your sample data and desired result below code should do the job.
Based on your description there could be additional use cases not yet covered. If so can you please extend your sample data with these cases and show us the desired outcome?
data master;
infile datalines truncover;
input ID $ Number Original_number Date:mmddyy10.;
format date mmddyy10.;
datalines;
A 1 1 06/02/2021
A 2 2 04/23/2021
B 1 1 03/08/2021
B 2 2 04/05/2021
B 3 3 10/23/2021
B 4 4 04/02/2022
C 2 2 06/03/2021
;
data transaction;
infile datalines truncover;
input ID $ Number Original_number Date:mmddyy10.;
format date date9.;
datalines;
A 2 1 04/23/2021
B 4 4 04/07/2022
C 2 1 06/03/2021
;
data master;
if _n_=1 then
do;
dcl hash h1(dataset:'master');
h1.defineKey('id','number','date');
h1.defineDone();
end;
modify master transaction;
by id original_number;
if _iorc_=0 then
do;
if h1.check() ne 0 then replace;
end;
run;
proc print data=master;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.