Desktop productivity for business analysts and programmers

Merging useing Hash tables

Reply
Contributor
Posts: 47

Merging useing Hash tables

 Hi,

 

I've just come across Hash tables based on a previous question I raised on this forum yesterday.

 

I'm currently merging a few datasets (these are monthly datasets and I will merger for next month's data, being September), using the code below. I've read the Hash tables could make this process a lot quicker.

 

Can anyone assist in using Hash Tables to merge the below datasets and also cater for new monthly datasets to be merged also

 

data reports.invoice;

merge

reports.invoicejul15

reports.invoiceaug15

reports.invoicesep15

reports.invoiceoct15

reports.invoicenov15

reports.invoicedec15

reports.invoicejan16

reports.invoicefeb16

reports.invoicemar16

reports.invoiceapr16

reports.invoicemay16

reports.invoicejun16

reports.invoicejul16

reports.invoiceaug16

reports.invoicesep16

reports.invoiceoct16

reports.invoicenov16

reports.invoicedec16

reports.invoicejan17

reports.invoicefeb17

reports.invoicemar17

reports.invoiceapr17

reports.invoicemay17

reports.invoicejun17

reports.invoicejul17

reports.invoiceaug17

 

;

by Extract_date;

 

where Invoice_Balance >0;

run;

Super User
Posts: 3,305

Re: Merging useing Hash tables

A better approach would be to create an invoice history table where you just add the latest month's data each month, perhaps something like this? It does assume that you are not updating prior months as well though.

 

data reports.invoice_history;
merge reports.invoice_history
reports.invoiceaug17 
;
by Extract_date;
 
where Invoice_Balance >0;
run;
Contributor
Posts: 47

Re: Merging useing Hash tables

Thanks for your quick response, unfortunately we sometimes (more often than I like) have to go and update prior months.
Super User
Posts: 5,610

Re: Merging useing Hash tables

Not seeing the data structure, but it's likely that MERGE is overkill and what is slowing down the program.  Wouldn't SET instead of MERGE give you what you need?  It would definitely be faster.

Super Contributor
Posts: 460

Re: Merging useing Hash tables

Posted in reply to Astounding

I agree with @Astounding - I've done a lot of work in the past evaluating performance of different types of merge and it's my experience that unless you're working with fairly large data sets the overhead of hash merging makes it slower than conventional merging. Of course if you have a REALLY large data set then hash merging wins hands down......

Contributor
Posts: 47

Re: Merging useing Hash tables

Posted in reply to ChrisBrooks
The merged datasets currently has 4.8mil rows in it. Not sure if that is defined as a really large dataset.
Contributor
Posts: 47

Re: Merging useing Hash tables

Posted in reply to Astounding
Have changed it to SET. Thanks for the advice.
Ask a Question
Discussion stats
  • 6 replies
  • 202 views
  • 3 likes
  • 4 in conversation