BookmarkSubscribeRSS Feed
nickspencer
Obsidian | Level 7
Hello everyone,

I have a proc sql code that is taking more than an hour hour to complete. Is there a way I can optimize it to run quicker?

Code:

Proc sql;
Create table want as
Select distinct a.*, c.*, b.column1
From table1 a
Left join table2 b
On a.ID1=b.ID1
Left join table3 c
On a.ID2=c.ID2
;
Quit;

Any suggestions is highly appreciated.

Thanks,
Nick
13 REPLIES 13
PGStats
Opal | Level 21

Hi Nick,

 

The performance of such a simple* operation will usually depend on

 

1) The size of the tables
2) The location of the tables (local or remote)
3) In which table(s) ID1 and ID2 are primary keys, if any.

 

* or maybe your query is not as simple as that.

PG
SASKiwi
PROC Star

Do you really need to use DISTINCT as that slows down queries significantly. If you are ending up with duplicate rows in your WANT dataset there are better of dealing with that but that requires knowing more about your input data.

noetsi
Obsidian | Level 7
We have tables with lots of variables which can be slow to run with any join. If you have a small set of variables you need from a large table, large in terms of both data and the number of variables, making a temp table from that (with no join just a select) with the variables you need and then joining to that rather than the original table can speed things up a lot.
Kurt_Bremser
Super User

Why do you have the DISTINCT in there? If there is no solid reason for it, remove it.

Your code is also sloppy, you will get WARNINGs. NEVER use asterisks like this in joins, use exhaustive variable lists.

This looks like a double lookup, which is best done in a data step with hash objects, if the lookup tables can fit into memory.

noetsi
Obsidian | Level 7

We use asterisks all the time in the select statement 🙂 too lazy (and too busy) to type out 30+ variables. I have never gotten a warning about doing so although I understand its not best practice.

noetsi
Obsidian | Level 7

I am not sure why SAS gives warnings when you have table.field code even with multiple fields with the same name in different tables. It will cause errors of course if you don't tell it what table the fields are in since it won't know what table you want a specific field from. I never noticed this occurring before. In many cases sas warnings are not really a problem, that is the code is correct for what you want and runs anyway. 

Reeza
Super User

Yes, but this doesn't follow the clean code principle. 

 

If you get in the habit of ignoring Warnings, you'll also miss important ones.

noetsi
Obsidian | Level 7
Time to complete a project, given that no one will read my code anyway since I am the only analyst where I work, is more important to me. Although that can cause issues when you rerun your own code of course. What is and is not an important warning is beyond me, but I don't ignore them. I just know, sometimes, that they are not pertinent to what I am doing. Amusingly, although I am supposed to only do statistics, the sql coding is about 99 percent of my job in practice. 🙂 One thing I have learned, comments are worth doing.
Reeza
Super User

@noetsi wrote:
Time to complete a project, given that no one will read my code anyway since I am the only analyst where I work, is more important to me. Although that can cause issues when you rerun your own code of course. What is and is not an important warning is beyond me, but I don't ignore them. I just know, sometimes, that they are not pertinent to what I am doing. Amusingly, although I am supposed to only do statistics, the sql coding is about 99 percent of my job in practice. 🙂 One thing I have learned, comments are worth doing.

You'll always be asked to re-run something and will miss errors in the long run with this habit. It's definitely easy to get sloppy when working alone for sure, but that's a choice you get to make and I know I've done that sometimes for the speed. But I've also learned in the long run it's better to code well from the start but it took years to learn that as well. 

noetsi
Obsidian | Level 7
It is common practice where I work to use the table.* (I am the only one to do Proc SQL but we have several who do other forms of SQL. All are world class coders which I am not). I have run into issues for sure using table.* - and they likely would be worse if I reused my code much. For the most part they are ad hoc projects that are not repeated.

We have a process at my organization where I build the original code and the expert coders review it for errors - which helps a lot.
Kurt_Bremser
Super User

This reminds me of one of the first pieces of advice I got from the American mentors in my second career as an  American Football game official:

"If you don't want to do it right, then why are you doing it at all?"

Tom
Super User Tom
Super User

You can probably do what you want more efficiently by doing it yourself instead of asking PROC SQL to guess how to do what you want.  But in SAS there are limited tools for instructing PROC SQL how to process the query.  So work on converting the query into actual SAS code.

 

As to what is a more efficient method, that depends on knowing more about the data.  How large are the three datasets?  Do the ID1 and ID2 variables uniquely identify the observations in the datasets?

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 1281 views
  • 2 likes
  • 7 in conversation