BookmarkSubscribeRSS Feed
bochen
Fluorite | Level 6

Hi all,

 

I am running a complicated query with sas sql, and the real time is much longer than the cpu time combined. Is there anything I can do to reduce the real time and increase efficieny?

 

Thanks,

10 REPLIES 10
LinusH
Tourmaline | Level 20
Every query is kinda unique.
Your data model, indexes, query logic, I/O bottlenecks etc - almost anything can have impact on your query performance.
There are lots of papers about query optimization, search support.Sas.com.
If you need specific guidance you need to describe your situation in much more detail.
Data never sleeps
bochen
Fluorite | Level 6

So the problem I have now is that I need to subset a table with around 5 millions of observations, (called this table A) based on another table B. The table A contains household purchase information: one row of data reprensents a household purchase of an item and a household generally buys many items (so household id varaible is repeated in many rows). Table B is just  a list of households (around 30,000) that I care about and there is only one variable hhold_id that uniquely identify a single household. I would like to select those purchases in table A from the households listed in table B. I actually used a natural left join to subset table A based on Table B, but it took my computer 20 mins to do this simple subsetting wheras the cpu time is relatively short. According to this article (http://support.sas.com/resources/papers/proceedings09/333-2009.pdf), my query is very inefficient and I tried to increase buffer size and number, enable multi threading, but none of these substantially shorten the processing time. 

 

Thanks,

LinusH
Tourmaline | Level 20

Doesn't sound like a complicated query.

Inner joins is often easier to optimize than outer joins. If you need all households, even those not within A, you could do a second join with this, when table size will be substantially smaller.

Seeing your fullstimer msglevel=i log output would help.

The most efficient join strategy is hash join, and indexed join is second (use _method PROC SQL option to see strategy chosen). Both requiring an index on the join column in A.

How frequent will you (and others?) do this kind of queries? If the performance is really a burden you may need to look oer your whole data platform (HW, storage engine).

Data never sleeps
bochen
Fluorite | Level 6

Initially I only would like to have A which is from a complex query generated by joining 7 tables (raw data, 5GB combined), but later I realized I need to get rid of some households based on a critrion. So I tried to include the subset criterion in the very first query. However, it turned out that the processing time substantially increases even though the subsetting criterion is relatively simple. Later I decdied to do what I have described in the last post, get table A and generate the subsetted households separately and do a inner join to select the desicred housholds in table A. I was kind of surpised that such a simple inner join query could take that long time. 

 

I need to change the criterion of selecting the households for several times for the statistical analysis later. I just do not want to run very time-consuming queries every time I decide to change the criterion. 

 

One related question: is it more efficient to generate many intermediate tables separately and join them later to get the final table or write a complicated query including all of the subqueries that generate the intermediate tables?

 

Thanks,

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Try:
data want;

  merge smalldata (in=a) bigdata (in=b);

  by household;

  if a then output;

run;

 

Assuming the data is sorted this should keep data on in the list of small data

moorsd
Obsidian | Level 7

Various factors can impact your real-time query performance, using SAS/ACCESS libnames, table size, data types, network latency, location of the databases and/or your SAS servers.

 

I'd also take a look at the SQL optimizer to see what your query is doing. This can been looked at by using the _TREE and _METHOD on the proc sql statement.

 

There's a fantasic paper by Russ Lavery on this subject. See link below. It's a very long and detailed paper, but if you have the time and inclination it's well worth a read.

 

http://www2.sas.com/proceedings/sugi30/101-30.pdf

 

Cheers

 

David

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Simplify the process then.  Don't do as much in the query.  Avoid subqeuries.  Use datastep processing rather than SQL.  

bochen
Fluorite | Level 6
subqeuries slows the entire query down? Or is it faster to create temp tables and use these tables in the main query?
LinusH
Tourmaline | Level 20
Not generally.

5gb doesn't sound that much.
Again seeing the resource consumption will help.
And it's quite difficult to give specific advice, to do so we need to be involved in detail in your work.
And still, I think that you need to reconsider your environment.
Data never sleeps
rohitdante16
Fluorite | Level 6

In general i would say instead of using a single we could accomplish the same in following steps:

 

1. Take dump of Table A using proc sql into SAS dataset.

2. Do the same for Table B

 

Load the Table B onto a proc format and apply that to table A.

 

This should optimize the code.

 

Most often the slowness of a query would depends on the connection driver either ODBC/Oracle and so on. 

So the throught process should ideally be to use a method to take a dump and use the other to optimize results, so i would say use the SQL code to pull the data and not join and use SAS formatting to join the data. if it helps create use proc ds2 for parallel record processing.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1743 views
  • 1 like
  • 5 in conversation