DATA Step, Macro, Functions and more

Hash vs SQL Performance

Reply
Super Contributor
Posts: 326

Hash vs SQL Performance

Hi experts,

Is it true that hashtables performs faster compared with SQL lookups? What are other advantages or limitations? I have no environment to test this out yet.

I'm also curious whether joins performed in-database performs faster versus sas joins. Example given these example:

create table result as select a.key,a.key2,b.var from a left join b on(a.key2 = b.key);

I'm using a's key2 to link with b to lookup the value var.

Kindly advise if explanation is confusing.

Thanks!

Milton

Super User
Posts: 10,046

Re: Hash vs SQL Performance

Sorry, I can't offer you more information. Maybe you should contact SAS Technical Support .

As far as I know Hash Table is the fastest way to execute querying, so I prefer to Hash Table if I want speed,

I would prefer to SQL if I want to save some words.

Ksharp

Occasional Contributor
Posts: 7

Re: Hash vs SQL Performance

Hash is the fastest method for looking up data, since the data is read just once from disk and then loaded into memory. There is also no need to sort or index that table in advance. The most important limitation is that memory available to SAS is enough to fit the hash table. There are situations a PROC SQL might be as fast as a self coded hash lookup, because the SQL optimizer will do a hashjoin if it determines the lookup table fits into memory.

In-database also also usually faster than SAS joins and might even be faster than hash lookup. With in-database joins the SAS system does not need to bring over all data from the database to SAS first, before the join can be excuted. A join done in-database will only transfer the results, resulting in less movement of data. The SQL optimizer tries to pass as much processing as possible to the database, but there are situations it does this less optimal than you can do by hand.

Super User
Posts: 3,260

Re: Hash vs SQL Performance

There is another lookup technique available in SAS that is in many cases comparable in performance to hash and that is the lookup format. With lookup formats the lookup is built in PROC FORMAT before the DATA step you use it in, whereas with hash you have to do it all in the same DATA step. I find being able to separate out the lookup an advantage if you are doing lots of them. I often do more than 10 lookups, sometimes over 20 in the same DATA step. The coding using lookup formats is more compact.

Ask a Question
Discussion stats
  • 3 replies
  • 1192 views
  • 0 likes
  • 4 in conversation