I had master, transaction and required output tables as mentioned below.
1ST CONDITION: For any ID, If any data is present in the master and if blank value is present in the transaction data for the same id then the value in the master data should not be updated with the blank value from Transaction data.(eg:ID- 1 GENDER:female)
2ND CONDITION: For any ID, If any value is present in the master and if any value is present in the transaction data also for the same id then the value in the master data should be updated with the value from Transaction data.(eg:ID- 2 GENDER:male)
Please help me to implement the following requirement using HASH programming only for updating the data bcoz as the master dataset has millions of records and using of UPDATE statement in SAS programming is time consuming .So please help me ..
Thanks in Advance.
MASTER | ||||
ID# | Date | Gender | Code | Class |
1 | . | female | AB | LOW |
2 | . | female | CD | LOW |
3 | 1/1/2021 | M | CD |
TRANSACTION | ||||
ID# | Date | Gender | Code | Class |
1 | 7/1/2020 | EF | ||
2 | 7/2/2020 | male | ||
3 | male |
output | ||||
ID# | Date | Gender | Code | Class |
1 | 7/1/2020 | female | EF | LOW |
2 | 7/2/2020 | male | CD | LOW |
3 | 1/1/2021 | male | CD |
Per my comment to your other topic update master table with transaction table, if you are producing a new master dataset by applying transactions to the old master dataset, you won't get much better performance than with the UPDATE statement, assuming your datasets are sorted by ID.
Hash might help if the master data are not sorted by ID, because you could put the transactions in a hash object in memory, and then retrieve as appropriate from that hash object when reading the (unsorted) master data set. You would have avoided sorting the master data set.
But using a hash object would introduce a new problem. Retrieving data from a hash object of transactions will retrieve any missing values in the transactions and overwrite the corresponding non-missing value in the master dataset. This is something that the UPDATE statement doesn't do - a very valuable property when applying transaction updates to master data sets. It is also something the default mode of MODIFY will avoid doing if you choose to update the master dataset in place.
Now one could make a hash object of each variable in the TRANSACT dataset, keeping only the non-missing values. Then you could safely use hash objects. But if the data are already sorted, this will be highly unlikely to improve performance compared to UPDATE or MODIFY.
Unlike your other post, I see you are entering this post in the Enterprise Guide section.
Are you processing a Local SAS data set? If so, is EG working with a local sas server? In short, where are the datasets? And where is the sas server? If they are not on the same machine, then your slowness is likely due to necessary transfer of the entire dataset for processing over your EG client to SAS server connection - and probably transferring the resulting dataset back again. Using HASH vs some other technique will not mitigate the primary reason for slow performance in such a case.
Best to keep your data where the server is.
@rohithverma - You would be best to explore a range of potential techniques before deciding on the best one. Hash isn't always the best solution.
A good start would be to post what you have tried so far including how long it took to run.
Using a hash table as you asked for could work like below.
As others stated if such a hash approach really helps will depend on your data and especially where it's stored as compared to where SAS executes.
data MASTER;
infile datalines truncover dlm='|' dsd;
input ID Date:ddmmyy. Gender $ Code $ Class $;
format date date9.;
datalines;
1|.|female|AB|LOW
2|.|female|CD|LOW
3|1/1/2021|M|CD|
;
data TRANSACTION;
infile datalines truncover dlm='|' dsd;
input ID Date:ddmmyy. Gender $ Code $ Class $;
format date date9.;
datalines;
1|7/1/2020||EF|
2|7/2/2020|male||
3||male||
;
proc sql;
create view v_transaction as
select
id,
date as _date,
gender as _gender,
code as _code,
class as _class
from transaction
where cmiss(date,gender,code,class) ne 4
;
quit;
data master;
if _n_=1 then
do;
if 0 then set v_transaction;
dcl hash h1(dataset:'v_transaction');
h1.defineKey('id');
h1.defineData(all:'y');
h1.defineDone();
end;
modify master;
if h1.find()=0 then
do;
date=coalesce(_date,date);
gender=coalescec(_gender,gender);
code=coalescec(_code,code);
class=coalescec(_class,class);
replace;
call missing (of _:);
end;
run;
proc print data=master;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.