DATA Step, Macro, Functions and more

Less resource intensive joins

Reply
Frequent Contributor
Posts: 94

Less resource intensive joins

I'm trying to join 17 tables together in SAS using PROC SQL and am running out of system resources.  All tables have a simple index on the joined field.  I'm thinking I can use the DATA STEP but I need to mimic COALESCE and left joins. 

Super User
Posts: 23,305

Re: Less resource intensive joins

Posted in reply to tomcmacdonald

@tomcmacdonald wrote:

I'm trying to join 17 tables together in SAS using PROC SQL and am running out of system resources. 

 

Are you joining all in single query or multiple queries?

Frequent Contributor
Posts: 94

Re: Less resource intensive joins

In a single query.
Frequent Contributor
Posts: 94

Re: Less resource intensive joins

[ Edited ]
Posted in reply to tomcmacdonald

Now I'm trying to run the code in UNIX batch mode and specifying -MEMSIZE MAX and -WORK <some directory> system options.

Respected Advisor
Posts: 4,679

Re: Less resource intensive joins

Posted in reply to tomcmacdonald

@tomcmacdonald

In my experience SAS SQL is not great when trying to join many tables in a single SQL. Depending on what you want to do and especially what volumes and relationships between the tables you're dealing with, some tweaks to the SQL or a SAS data step can reduce resource consumption and/or improve performance a lot.

 

In case this is one big table with a lot of lookups (i.e. a fact table with dimension and reference tables) then using a SAS data step with hash lookup tables can be much more efficient. Logically left joins and coalesce() can also get implemented in a SAS data step.

Super User
Posts: 23,305

Re: Less resource intensive joins

If it's a fact-dim reconstruction, I'm a fan of formats as well. They also tend to come in handy in other areas so you may even have them built already.
Super User
Posts: 5,852

Re: Less resource intensive joins

Posted in reply to tomcmacdonald

The problem here is the left joins, since SAS (and other) can optimize the as god as inner joins.

Usually when creating star schemas, dimension values that are missing can be assigned a dummy value, like a dimsnion key -1. Then you can switch to inner joins.

If your data is in Base SAS libraries consider moving to SPDE, which has better index management and some additional use cases for index use in the where planner.

Data never sleeps
Ask a Question
Discussion stats
  • 6 replies
  • 145 views
  • 0 likes
  • 4 in conversation