BookmarkSubscribeRSS Feed
rohithverma
Obsidian | Level 7

I had master, transaction and required output tables as mentioned below.

1ST CONDITION: For any ID, If any data is present in the master and if blank value is present in the transaction data for the same id then  the value in the master data should not be updated with the  blank value from Transaction data.(eg:ID- 1 GENDER:female)

2ND CONDITION: For any ID, If any value   is present in the master and if any  value  is present in the transaction data also for the same id then  the value in the master data should  be updated with the  value  from Transaction data.(eg:ID- 2 GENDER:male)

 

Please help me to implement the following requirement using  HASH programming only for updating the data bcoz as the master dataset has  millions of records and  using of UPDATE  statement in SAS programming is  time consuming .So please help me ..

 

Thanks in Advance.

MASTER        
ID# Date Gender Code Class
1 . female AB LOW
2 . female CD LOW
3 1/1/2021 M CD  

 

TRANSACTION        
ID# Date Gender Code Class
1 7/1/2020   EF  
2 7/2/2020 male    
3   male    

 

output        
ID# Date Gender Code Class
1 7/1/2020 female EF LOW
2 7/2/2020 male CD LOW
3 1/1/2021 male CD  

 

 

 

 

 

4 REPLIES 4
mkeintz
PROC Star

Per my comment to your other topic update master table with transaction table, if you are producing a new master dataset by applying transactions to the old master dataset, you won't get much better performance than with the UPDATE statement, assuming your datasets are sorted by ID.

 

Hash might help if the master data are not sorted by ID, because you could put the transactions in a hash object in memory, and then retrieve as appropriate from that hash object when reading the (unsorted) master data set.  You would have avoided sorting the master data set.

 

But using a hash object would introduce a new problem.  Retrieving data from a hash object of transactions will retrieve any missing values in the transactions and overwrite the corresponding non-missing value in the master dataset.   This is something that the UPDATE statement doesn't do -  a very valuable property when applying transaction updates to master data sets.  It is also something the default mode of MODIFY will avoid doing if you choose to update the master dataset in place.

 

Now one could make a hash object of each variable in the TRANSACT dataset, keeping only the non-missing values.  Then you could safely use hash objects.  But if the data are already sorted, this will be highly unlikely to improve performance compared to UPDATE or MODIFY.  

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
mkeintz
PROC Star

Unlike your other post, I see you are entering this post in the Enterprise Guide section.

 

Are you processing a Local SAS data set?  If so, is EG working with a local sas server?   In short, where are the datasets?  And where is the sas server?   If they are not on the same machine, then your slowness is likely due to necessary transfer of the entire dataset for processing over your EG client to SAS server connection - and probably transferring the resulting dataset back again.  Using HASH vs some other technique will not mitigate the primary reason for slow performance in such a case.

 

Best to keep your data where the server is.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
SASKiwi
PROC Star

@rohithverma  - You would be best to explore a range of potential techniques before deciding on the best one. Hash isn't always the best solution.

 

A good start would be to post what you have tried so far including how long it took to run. 

Patrick
Opal | Level 21

Using a hash table as you asked for could work like below. 

As others stated if such a hash approach really helps will depend on your data and especially where it's stored as compared to where SAS executes.

data MASTER;
  infile datalines truncover dlm='|' dsd;
  input ID Date:ddmmyy. Gender $ Code $ Class $;
  format date date9.;
  datalines;
1|.|female|AB|LOW
2|.|female|CD|LOW
3|1/1/2021|M|CD| 
;

data TRANSACTION;
  infile datalines truncover dlm='|' dsd;
  input ID Date:ddmmyy. Gender $ Code $ Class $;
  format date date9.;
  datalines;
1|7/1/2020||EF| 
2|7/2/2020|male|| 
3||male|| 
;

proc sql;
  create view v_transaction as
    select 
      id,
      date as _date,
      gender as _gender,
      code as _code,
      class as _class
    from transaction
    where cmiss(date,gender,code,class) ne 4
    ;
quit;

data master;
  if _n_=1 then 
    do;
      if 0 then set v_transaction;
      dcl hash h1(dataset:'v_transaction');
      h1.defineKey('id');
      h1.defineData(all:'y');
      h1.defineDone();
    end;

  modify master;

  if h1.find()=0 then 
    do;
      date=coalesce(_date,date);
      gender=coalescec(_gender,gender);
      code=coalescec(_code,code);
      class=coalescec(_class,class);
      replace;
      call missing (of _:);
    end;
run;

proc print data=master;
run;

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1190 views
  • 2 likes
  • 4 in conversation