Help using Base SAS procedures

General Query non PROC SQL Optimization Techniques

Reply
Contributor
Posts: 25

General Query non PROC SQL Optimization Techniques

Hi,

     I would like to gather the techniques which can be applied on most of PROC SQL queries which improve the performance when huge data is processed.

     In case you have some ideas please share it here which would help many of us.

     Some of the techniques may be to,
          * Read only the columns which are necessary
          * Use count(column name) function before actually executing the query to see if that query fetches any rows?
          * Use Describe/Validate statements etc..

Super User
Posts: 10,466

Re: General Query non PROC SQL Optimization Techniques

How huge is "huge"? Millions of records? Billions? Trillions?

One thing to consider when dealing with SAS is whether a PROC SQL or another Procedure or Data step may be better.

Example: Consider a data set with several categorical variables where I would like to generate summaries of some numeric variables (counts, means, quartiles, standard deviations) for the data overall, each level of each categorical variable, and each combination 2-at-a-time, 3-at-a-time, 4-at-a-time of the categorical variables and have the output in a single dataset.

I suspect that in the time it takes to write the SQL code, which I am not even going to try to write (or maintain), that a 5 line Proc Means/ summary call could have finished running. And likely executes faster as the code is optimized for such.

And one of those 5 lines of code is

Run;

I suspect that

* Use count(column name) function before actually executing the query to see if that query fetches any rows?

could add time as the dataset has to be read to get a count unless I am misunderstanding the context.

Occasional Contributor
Posts: 13

Re: General Query non PROC SQL Optimization Techniques

Here are a couple of ideas:

  • You can index the commonly used columns of the huge tables.  If you are using a filter or join on an indexed column, SAS will not process the entire set sequentially.  You can add indices with PROC DATASETS.
  • For the where clause, use the most exclusive filters first if there are multiple conditions.  That is, specify the filters first for those that remove the most rows from the source set.
  • Avoid joining large tables together until you have filtered down the inputs (if possible)
  • Avoid cross joins and aggregates on the data source until you have filtered it down to only what you need (if possible)

I agree with ballardw on the use of the count.  If you are performing filters against your large set to get a count, you are probably forcing the code to run through the input set twice.  Though if you think there may be no results at times, this may be something that helps performance overall.

Super User
Posts: 6,928

Re: General Query non PROC SQL Optimization Techniques

In many cases, the biggest optimization to PROC SQL is to not use it at all, but switch to appropriate data and proc steps instead.

Not long ago, one of the posters here had a SQL that ran for almost a day. After replacing it, he couldn't believe (at first) he'd done it right when it finished in < 10 mins.

If you need to use SQL (ie to get a cartesian join), it helps to prepare the data first and make the job as "easy" as possible for the SQL step. Reduce columns, create counts etc with proc summary and so on.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Posts: 5,254

Re: General Query non PROC SQL Optimization Techniques

Usually, "general" questions is not well suited for forum discussions. Easier to interact on specifics.

So, to learn optimization:

  • Take some training
  • Read manuals, books and posts/papers
  • Try out different techniques in your test environment
  • Interact with others on specific matters
Data never sleeps
Ask a Question
Discussion stats
  • 4 replies
  • 348 views
  • 1 like
  • 5 in conversation