BookmarkSubscribeRSS Feed
nrk1787db1
Obsidian | Level 7

Hi,

     I would like to gather the techniques which can be applied on most of PROC SQL queries which improve the performance when huge data is processed.

     In case you have some ideas please share it here which would help many of us.

     Some of the techniques may be to,
          * Read only the columns which are necessary
          * Use count(column name) function before actually executing the query to see if that query fetches any rows?
          * Use Describe/Validate statements etc..

4 REPLIES 4
ballardw
Super User

How huge is "huge"? Millions of records? Billions? Trillions?

One thing to consider when dealing with SAS is whether a PROC SQL or another Procedure or Data step may be better.

Example: Consider a data set with several categorical variables where I would like to generate summaries of some numeric variables (counts, means, quartiles, standard deviations) for the data overall, each level of each categorical variable, and each combination 2-at-a-time, 3-at-a-time, 4-at-a-time of the categorical variables and have the output in a single dataset.

I suspect that in the time it takes to write the SQL code, which I am not even going to try to write (or maintain), that a 5 line Proc Means/ summary call could have finished running. And likely executes faster as the code is optimized for such.

And one of those 5 lines of code is

Run;

I suspect that

* Use count(column name) function before actually executing the query to see if that query fetches any rows?

could add time as the dataset has to be read to get a count unless I am misunderstanding the context.

Sonywell
Fluorite | Level 6

Here are a couple of ideas:

  • You can index the commonly used columns of the huge tables.  If you are using a filter or join on an indexed column, SAS will not process the entire set sequentially.  You can add indices with PROC DATASETS.
  • For the where clause, use the most exclusive filters first if there are multiple conditions.  That is, specify the filters first for those that remove the most rows from the source set.
  • Avoid joining large tables together until you have filtered down the inputs (if possible)
  • Avoid cross joins and aggregates on the data source until you have filtered it down to only what you need (if possible)

I agree with ballardw on the use of the count.  If you are performing filters against your large set to get a count, you are probably forcing the code to run through the input set twice.  Though if you think there may be no results at times, this may be something that helps performance overall.

Kurt_Bremser
Super User

In many cases, the biggest optimization to PROC SQL is to not use it at all, but switch to appropriate data and proc steps instead.

Not long ago, one of the posters here had a SQL that ran for almost a day. After replacing it, he couldn't believe (at first) he'd done it right when it finished in < 10 mins.

If you need to use SQL (ie to get a cartesian join), it helps to prepare the data first and make the job as "easy" as possible for the SQL step. Reduce columns, create counts etc with proc summary and so on.

LinusH
Tourmaline | Level 20

Usually, "general" questions is not well suited for forum discussions. Easier to interact on specifics.

So, to learn optimization:

  • Take some training
  • Read manuals, books and posts/papers
  • Try out different techniques in your test environment
  • Interact with others on specific matters
Data never sleeps

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1022 views
  • 1 like
  • 5 in conversation