Using SAS hash to merge/update records. How can I use hash to only update the records that are missing on my set dataset. Which in the below example will be ID's - 1, 2, 4 and 6? And without setting ID# 3, 5, 7 and 8 to a blank as it's blank on my define dataset (doesn't need an update). Thanks!
ID# | Date |
1 | . |
2 | . |
3 | 1/1/2021 |
4 | . |
5 | 7/10/2020 |
6 | . |
7 | 3/1/2019 |
8 | 5/2/2017 |
I found the solution by adding a second hash. The first HASH retains the DATE values from Table 2 and the second hash updates only the DATE values on Table 2 where the DATE is not missing. Which is exactly what I wanted. Helps to be passionate or "obsessed" to solve problems and find solutions.
data Want;
drop rc;
length Flag $35 Comment $35;
if _n_=1 then
do;
declare hash lookup(dataset:"Have");
lookup.defineKey("ID");
lookup.defineData("Gender"
,"Code"
,"Class"
,"Flag"
,"Comment");
lookup.definedone();
call missing(Flag,Comment, DATE);
declare hash lookup2(dataset:"Have (where=(not missing(DATE))");
lookup2.defineKey("ID");
lookup2.defineData("Date")
lookup2.definedone();
end;
set Need_Update;
rc=lookup.find();
if rc eq 0 then do;
rc=lookup2.find();
end;
run;
I'm guessing an IF missing(DATE) then do; UPDATE; end;
Don't you reckon?
set ;
if (this obs needs update) then rc=lookup.find();
run;
You have not provided enough details for me to know what "this obs needs update" actually is. Perhaps
if missing(date) then rc=lookup.find();
I have multiple vars I am attempting to update. For the Date var I only want to update where it's missing. Below is the code I am using and the result I get. How can I update all the vars and only the instance where the Date is missing? Thanks
data Want;
drop rc;
length Flag $35 Comment $35;
if _n_=1 then
do;
declare hash lookup(dataset:"Have");
lookup.defineKey("ID");
lookup.defineData("Date"
,"Gender"
,"Code"
,"Class"
,"Flag"
,"Comment");
lookup.definedone();
call missing(Flag,Comment);
end;
set Need_Update;
rc=lookup.find();
run;
Have | ||||
ID# | Date | Gender | Code | Class |
1 | . | F | AB | |
2 | . | F | ||
3 | 1/1/2021 | M | MIDDLE | |
4 | . | MIDDLE | ||
5 | 7/10/2020 | M | AB | |
6 | . | M | HIGH | |
7 | 3/1/2019 | AB | ||
8 | 5/2/2017 | AB | HIGH |
Want | ||||
ID# | Date | Gender | Code | Class |
1 | 7/1/2020 | F | AB | LOW |
2 | 7/2/2020 | F | CD | LOW |
3 | 1/1/2021 | M | CD | MIDDLE |
4 | 1/2/2021 | F | CD | MIDDLE |
5 | 7/10/2020 | M | AB | LOW |
6 | 7/11/2020 | M | DE | HIGH |
7 | 3/1/2019 | F | AB | HIGH |
8 | 5/2/2017 | F | AB | HIGH |
What I am getting | ||||
ID# | Date | Gender | Code | Class |
1 | 7/1/2020 | F | AB | LOW |
2 | 7/2/2020 | F | CD | LOW |
3 | M | CD | MIDDLE | |
4 | 1/2/2021 | F | CD | MIDDLE |
5 | M | AB | LOW | |
6 | 7/11/2020 | M | DE | HIGH |
7 | F | AB | HIGH | |
8 | F | AB | HIGH |
Correction Below is what I Have and the output I am getting on my Need_Update dataset
Have | ||||
ID# | Date | Gender | Code | Class |
1 | . | F | AB | LOW |
2 | . | F | CD | LOW |
3 | 1/1/2021 | M | CD | MIDDLE |
4 | . | F | CD | MIDDLE |
5 | 7/10/2020 | M | AB | LOW |
6 | . | M | DE | HIGH |
7 | 3/1/2019 | F | AB | HIGH |
8 | 5/2/2017 | F | AB |
HIGH |
Need_Update | ||||
ID# | Date | Gender | Code | Class |
1 | 7/1/2020 | F | AB | LOW |
2 | 7/2/2020 | F | CD | LOW |
3 | M | CD | MIDDLE | |
4 | 1/2/2021 | F | CD | MIDDLE |
5 | M | AB | LOW | |
6 | 7/11/2020 | M | DE | HIGH |
7 | F | AB | HIGH | |
8 | F | AB | HIGH |
Sorry for the confusion. Let me try again:
Table 1 is the table I am using to update/merge the missing values in Table 2.
Table 3 is what I get as the result. The values from Table 1 replaces the values on Table 2. How can I only update values if missing on Table 2? In this instance ID# 1, 2, 4 and 6 will be retained after the merge.
Table 4 is what I want. Thank you
Table 1 | ||||
ID# | Date | Gender | Code | Class |
1 | . | F | AB | LOW |
2 | . | F | CD | LOW |
3 | 1/1/2021 | M | CD | MIDDLE |
4 | . | F | CD | MIDDLE |
5 | 7/10/2020 | M | AB | LOW |
6 | . | M | DE | HIGH |
7 | 3/1/2019 | F | AB | HIGH |
8 | 5/2/2017 | F | AB | HIGH |
Table 2 | ||||
ID# | Date | Gender | Code | Class |
1 | 7/1/2020 | F | AB | |
2 | 7/2/2020 | F | ||
3 | M | MIDDLE | ||
4 | 1/2/2021 | MIDDLE | ||
5 | M | AB | ||
6 | 7/11/2020 | M | HIGH | |
7 | AB | |||
8 | AB | HIGH |
Table 3 | ||||
ID# | Date | Gender | Code | Class |
1 | . | F | AB | LOW |
2 | . | F | CD | LOW |
3 | 1/1/2021 | M | CD | MIDDLE |
4 | . | F | CD | MIDDLE |
5 | 7/10/2020 | M | AB | LOW |
6 | . | M | DE | HIGH |
7 | 3/1/2019 | F | AB | HIGH |
8 | 5/2/2017 | F | AB | HIGH |
Table 4 | ||||
ID# | Date | Gender | Code | Class |
1 | 7/1/2020 | F | AB | LOW |
2 | 7/2/2020 | F | CD | LOW |
3 | 1/1/2021 | M | CD | MIDDLE |
4 | 1/2/2021 | F | CD | MIDDLE |
5 | 7/10/2020 | M | AB | LOW |
6 | 7/11/2020 | M | DE | HIGH |
7 | 3/1/2019 | F | AB | HIGH |
8 | 5/2/2017 | F | AB | HIGH |
I don't understand what HASH objects have to do with this problem.
A simple UPDATE statement should handle it, but you need to reverse how you are thinking about the data. You are not updating TWO with the looked up values from ONE. You are overriding the default values in ONE with the actual values from TWO. So TWO is your transaction file. Any non-missing value replace the value from ONE. And a missing value means leave the value from ONE in place.
ata one ;
input ID Date :mmddyy. Gender $ Code $ Class $;
format date yymmdd10.;
cards;
1 . F AB LOW
2 . F CD LOW
3 1/1/2021 M CD MIDDLE
4 . F CD MIDDLE
5 7/10/2020 M AB LOW
6 . M DE HIGH
7 3/1/2019 F AB HIGH
8 5/2/2017 F AB HIGH
;
data two;
input ID Date :mmddyy. Gender $ Code $ Class $;
format date yymmdd10.;
cards;
1 7/1/2020 F AB .
2 7/2/2020 F . .
3 . M . MIDDLE
4 1/2/2021 . . MIDDLE
5 . M AB .
6 7/11/2020 M . HIGH
7 . . AB .
8 . . AB HIGH
;
data want ;
update two one;
by id;
run;
Results:
Don't be obsessed with a certain tool. See if @Tom's suggestion works. If not, supply data that illustrates where the result deviates from what you expected.
And do supply example data in usable form (data step(s) with datalines), as demonstrated.
I found the solution by adding a second hash. The first HASH retains the DATE values from Table 2 and the second hash updates only the DATE values on Table 2 where the DATE is not missing. Which is exactly what I wanted. Helps to be passionate or "obsessed" to solve problems and find solutions.
data Want;
drop rc;
length Flag $35 Comment $35;
if _n_=1 then
do;
declare hash lookup(dataset:"Have");
lookup.defineKey("ID");
lookup.defineData("Gender"
,"Code"
,"Class"
,"Flag"
,"Comment");
lookup.definedone();
call missing(Flag,Comment, DATE);
declare hash lookup2(dataset:"Have (where=(not missing(DATE))");
lookup2.defineKey("ID");
lookup2.defineData("Date")
lookup2.definedone();
end;
set Need_Update;
rc=lookup.find();
if rc eq 0 then do;
rc=lookup2.find();
end;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.