09-21-2017 12:15 AM
I've just come across Hash tables based on a previous question I raised on this forum yesterday.
I'm currently merging a few datasets (these are monthly datasets and I will merger for next month's data, being September), using the code below. I've read the Hash tables could make this process a lot quicker.
Can anyone assist in using Hash Tables to merge the below datasets and also cater for new monthly datasets to be merged also
where Invoice_Balance >0;
09-21-2017 12:41 AM
A better approach would be to create an invoice history table where you just add the latest month's data each month, perhaps something like this? It does assume that you are not updating prior months as well though.
data reports.invoice_history; merge reports.invoice_history reports.invoiceaug17 ; by Extract_date; where Invoice_Balance >0; run;
09-21-2017 01:05 AM
Not seeing the data structure, but it's likely that MERGE is overkill and what is slowing down the program. Wouldn't SET instead of MERGE give you what you need? It would definitely be faster.
09-21-2017 03:51 AM
I agree with @Astounding - I've done a lot of work in the past evaluating performance of different types of merge and it's my experience that unless you're working with fairly large data sets the overhead of hash merging makes it slower than conventional merging. Of course if you have a REALLY large data set then hash merging wins hands down......