05-22-2017 04:00 PM
Due to a need to have data from several tables be available for querying by our business users, I created a view which left joins 10 tables and unions the results to perform the full dataset. The problem here is that the view underperforms in a significant way. In light of this I would be grateful if more sensible alternatives to my crazy view can be suggested that I can try them
05-23-2017 11:02 AM
Thanks for this... I dont know why I never thought to just pull the data into a table. Its up and running and being evaluated as we speak.
05-22-2017 04:17 PM
What are the reasons for doing an SQL view in the first place? If you replace it with a table and refresh it daily does that satisfy your requirements?
05-23-2017 11:05 AM
05-22-2017 05:27 PM
Complex joins often have bad performance in SAS SQL. Depending on the structure of the joins (many-to-many vs. one-to-many), using a series of data and sort steps may be better suited for your task.
Only if you need a cartesian result in a many-to-many situation will SQL be unavoidable.
05-22-2017 08:22 PM - edited 05-22-2017 08:46 PM
Like already suggested nothing will perform as good as replicating the data by implementing a new physical table.
If you don't want to replicate the data then what you could try:
Option 1: SQL View
- Create indexes on the columns used for joining.
- Should this be a case of adding dimensional information to a fact table then also try the ROLE= data set option
I'd assume this option instructs the SQL compiler do use a hash lookup and you eventually don't even need indexes anymore.
- And last but not least: I'd assume the main time in your SQL is spent for sorting. IF your lookup tables are not too big then you could increase the value for ubufsize (buffersize prior SAS 9.4). The bigger this value the more memory will be consumed but the higher the chance that your lookups will happen in memory (hash) and not physically by sorting.
This quite dated but very interesting article explains how things work: https://support.sas.com/techsup/technote/ts553.html
Option 2: Datastep View
If you're joining a big table to multiple rather small lookup tables using different keys then you could also go for a data step view and use SAS data step hash lookups. This would avoid the need to sort the big table multiple times which could save quite a bit of time.
05-23-2017 11:10 AM
When I left join tables, I try to always join to a common table. e.g.
on a.id = b.id
on a.id = c.id
on a.id = d.id
rather than this equally correct, but sometimes dangerous approach.
on a.id = b.id
on b.id = c.id
on c.id = d.id
I wasn't sure what you were doing with the Unions. I have often used a Common Table Expression to get all of the values I need for the join. e.g. (not sure I have the syntax exactly right here)
;with All_IDs as
(select ID from
(select distinct ID from a
select distinct ID from b
select distinct ID from c))
on All_IDs.ID = a.ID
on All_IDs.ID = b.ID
on All_IDs.ID = c.ID
I also like to break down the query to its subcomponents to see which part is particularly running slow. This helps focus my optimization work.
Need further help from the community? Please ask a new question.