BookmarkSubscribeRSS Feed
Haydn
Quartz | Level 8

 Hi,

 

I've just come across Hash tables based on a previous question I raised on this forum yesterday.

 

I'm currently merging a few datasets (these are monthly datasets and I will merger for next month's data, being September), using the code below. I've read the Hash tables could make this process a lot quicker.

 

Can anyone assist in using Hash Tables to merge the below datasets and also cater for new monthly datasets to be merged also

 

data reports.invoice;

merge

reports.invoicejul15

reports.invoiceaug15

reports.invoicesep15

reports.invoiceoct15

reports.invoicenov15

reports.invoicedec15

reports.invoicejan16

reports.invoicefeb16

reports.invoicemar16

reports.invoiceapr16

reports.invoicemay16

reports.invoicejun16

reports.invoicejul16

reports.invoiceaug16

reports.invoicesep16

reports.invoiceoct16

reports.invoicenov16

reports.invoicedec16

reports.invoicejan17

reports.invoicefeb17

reports.invoicemar17

reports.invoiceapr17

reports.invoicemay17

reports.invoicejun17

reports.invoicejul17

reports.invoiceaug17

 

;

by Extract_date;

 

where Invoice_Balance >0;

run;

6 REPLIES 6
SASKiwi
PROC Star

A better approach would be to create an invoice history table where you just add the latest month's data each month, perhaps something like this? It does assume that you are not updating prior months as well though.

 

data reports.invoice_history;
merge reports.invoice_history
reports.invoiceaug17 
;
by Extract_date;
 
where Invoice_Balance >0;
run;
Haydn
Quartz | Level 8
Thanks for your quick response, unfortunately we sometimes (more often than I like) have to go and update prior months.
Astounding
PROC Star

Not seeing the data structure, but it's likely that MERGE is overkill and what is slowing down the program.  Wouldn't SET instead of MERGE give you what you need?  It would definitely be faster.

ChrisBrooks
Ammonite | Level 13

I agree with @Astounding - I've done a lot of work in the past evaluating performance of different types of merge and it's my experience that unless you're working with fairly large data sets the overhead of hash merging makes it slower than conventional merging. Of course if you have a REALLY large data set then hash merging wins hands down......

Haydn
Quartz | Level 8
The merged datasets currently has 4.8mil rows in it. Not sure if that is defined as a really large dataset.
Haydn
Quartz | Level 8
Have changed it to SET. Thanks for the advice.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1211 views
  • 3 likes
  • 4 in conversation