Hi,
I would like to gather the techniques which can be applied on most of PROC SQL queries which improve the performance when huge data is processed.
In case you have some ideas please share it here which would help many of us.
Some of the techniques may be to,
* Read only the columns which are necessary
* Use count(column name) function before actually executing the query to see if that query fetches any rows?
* Use Describe/Validate statements etc..
How huge is "huge"? Millions of records? Billions? Trillions?
One thing to consider when dealing with SAS is whether a PROC SQL or another Procedure or Data step may be better.
Example: Consider a data set with several categorical variables where I would like to generate summaries of some numeric variables (counts, means, quartiles, standard deviations) for the data overall, each level of each categorical variable, and each combination 2-at-a-time, 3-at-a-time, 4-at-a-time of the categorical variables and have the output in a single dataset.
I suspect that in the time it takes to write the SQL code, which I am not even going to try to write (or maintain), that a 5 line Proc Means/ summary call could have finished running. And likely executes faster as the code is optimized for such.
And one of those 5 lines of code is
Run;
I suspect that
* Use count(column name) function before actually executing the query to see if that query fetches any rows?
could add time as the dataset has to be read to get a count unless I am misunderstanding the context.
Here are a couple of ideas:
I agree with ballardw on the use of the count. If you are performing filters against your large set to get a count, you are probably forcing the code to run through the input set twice. Though if you think there may be no results at times, this may be something that helps performance overall.
In many cases, the biggest optimization to PROC SQL is to not use it at all, but switch to appropriate data and proc steps instead.
Not long ago, one of the posters here had a SQL that ran for almost a day. After replacing it, he couldn't believe (at first) he'd done it right when it finished in < 10 mins.
If you need to use SQL (ie to get a cartesian join), it helps to prepare the data first and make the job as "easy" as possible for the SQL step. Reduce columns, create counts etc with proc summary and so on.
Usually, "general" questions is not well suited for forum discussions. Easier to interact on specifics.
So, to learn optimization:
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Lock in the best rate now before the price increases on April 1.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.