Help using Base SAS procedures

SAS sql efficiency

Reply
Occasional Contributor
Posts: 8

SAS sql efficiency

Hi all,

 

I am running a complicated query with sas sql, and the real time is much longer than the cpu time combined. Is there anything I can do to reduce the real time and increase efficieny?

 

Thanks,

Super User
Posts: 5,424

Re: SAS sql efficiency

Every query is kinda unique.
Your data model, indexes, query logic, I/O bottlenecks etc - almost anything can have impact on your query performance.
There are lots of papers about query optimization, search support.Sas.com.
If you need specific guidance you need to describe your situation in much more detail.
Data never sleeps
Occasional Contributor
Posts: 8

Re: SAS sql efficiency

So the problem I have now is that I need to subset a table with around 5 millions of observations, (called this table A) based on another table B. The table A contains household purchase information: one row of data reprensents a household purchase of an item and a household generally buys many items (so household id varaible is repeated in many rows). Table B is just  a list of households (around 30,000) that I care about and there is only one variable hhold_id that uniquely identify a single household. I would like to select those purchases in table A from the households listed in table B. I actually used a natural left join to subset table A based on Table B, but it took my computer 20 mins to do this simple subsetting wheras the cpu time is relatively short. According to this article (http://support.sas.com/resources/papers/proceedings09/333-2009.pdf), my query is very inefficient and I tried to increase buffer size and number, enable multi threading, but none of these substantially shorten the processing time. 

 

Thanks,

Super User
Posts: 5,424

Re: SAS sql efficiency

Doesn't sound like a complicated query.

Inner joins is often easier to optimize than outer joins. If you need all households, even those not within A, you could do a second join with this, when table size will be substantially smaller.

Seeing your fullstimer msglevel=i log output would help.

The most efficient join strategy is hash join, and indexed join is second (use _method PROC SQL option to see strategy chosen). Both requiring an index on the join column in A.

How frequent will you (and others?) do this kind of queries? If the performance is really a burden you may need to look oer your whole data platform (HW, storage engine).

Data never sleeps
Occasional Contributor
Posts: 8

Re: SAS sql efficiency

[ Edited ]

Initially I only would like to have A which is from a complex query generated by joining 7 tables (raw data, 5GB combined), but later I realized I need to get rid of some households based on a critrion. So I tried to include the subset criterion in the very first query. However, it turned out that the processing time substantially increases even though the subsetting criterion is relatively simple. Later I decdied to do what I have described in the last post, get table A and generate the subsetted households separately and do a inner join to select the desicred housholds in table A. I was kind of surpised that such a simple inner join query could take that long time. 

 

I need to change the criterion of selecting the households for several times for the statistical analysis later. I just do not want to run very time-consuming queries every time I decide to change the criterion. 

 

One related question: is it more efficient to generate many intermediate tables separately and join them later to get the final table or write a complicated query including all of the subqueries that generate the intermediate tables?

 

Thanks,

Super User
Super User
Posts: 7,942

Re: SAS sql efficiency

Try:
data want;

  merge smalldata (in=a) bigdata (in=b);

  by household;

  if a then output;

run;

 

Assuming the data is sorted this should keep data on in the list of small data

Contributor
Posts: 39

Re: SAS sql efficiency

Various factors can impact your real-time query performance, using SAS/ACCESS libnames, table size, data types, network latency, location of the databases and/or your SAS servers.

 

I'd also take a look at the SQL optimizer to see what your query is doing. This can been looked at by using the _TREE and _METHOD on the proc sql statement.

 

There's a fantasic paper by Russ Lavery on this subject. See link below. It's a very long and detailed paper, but if you have the time and inclination it's well worth a read.

 

http://www2.sas.com/proceedings/sugi30/101-30.pdf

 

Cheers

 

David

Super User
Super User
Posts: 7,942

Re: SAS sql efficiency

Simplify the process then.  Don't do as much in the query.  Avoid subqeuries.  Use datastep processing rather than SQL.  

Occasional Contributor
Posts: 8

Re: SAS sql efficiency

subqeuries slows the entire query down? Or is it faster to create temp tables and use these tables in the main query?
Super User
Posts: 5,424

Re: SAS sql efficiency

Not generally.

5gb doesn't sound that much.
Again seeing the resource consumption will help.
And it's quite difficult to give specific advice, to do so we need to be involved in detail in your work.
And still, I think that you need to reconsider your environment.
Data never sleeps
Ask a Question
Discussion stats
  • 9 replies
  • 478 views
  • 1 like
  • 4 in conversation