Hi Nick,
The performance of such a simple* operation will usually depend on
1) The size of the tables
2) The location of the tables (local or remote)
3) In which table(s) ID1 and ID2 are primary keys, if any.
* or maybe your query is not as simple as that.
Do you really need to use DISTINCT as that slows down queries significantly. If you are ending up with duplicate rows in your WANT dataset there are better of dealing with that but that requires knowing more about your input data.
Why do you have the DISTINCT in there? If there is no solid reason for it, remove it.
Your code is also sloppy, you will get WARNINGs. NEVER use asterisks like this in joins, use exhaustive variable lists.
This looks like a double lookup, which is best done in a data step with hash objects, if the lookup tables can fit into memory.
We use asterisks all the time in the select statement 🙂 too lazy (and too busy) to type out 30+ variables. I have never gotten a warning about doing so although I understand its not best practice.
In the code posted, at least ID1 and ID2 are present in more than one table, which will cause WARNINGs.
I am not sure why SAS gives warnings when you have table.field code even with multiple fields with the same name in different tables. It will cause errors of course if you don't tell it what table the fields are in since it won't know what table you want a specific field from. I never noticed this occurring before. In many cases sas warnings are not really a problem, that is the code is correct for what you want and runs anyway.
Yes, but this doesn't follow the clean code principle.
If you get in the habit of ignoring Warnings, you'll also miss important ones.
@noetsi wrote:
Time to complete a project, given that no one will read my code anyway since I am the only analyst where I work, is more important to me. Although that can cause issues when you rerun your own code of course. What is and is not an important warning is beyond me, but I don't ignore them. I just know, sometimes, that they are not pertinent to what I am doing. Amusingly, although I am supposed to only do statistics, the sql coding is about 99 percent of my job in practice. 🙂 One thing I have learned, comments are worth doing.
You'll always be asked to re-run something and will miss errors in the long run with this habit. It's definitely easy to get sloppy when working alone for sure, but that's a choice you get to make and I know I've done that sometimes for the speed. But I've also learned in the long run it's better to code well from the start but it took years to learn that as well.
This reminds me of one of the first pieces of advice I got from the American mentors in my second career as an American Football game official:
"If you don't want to do it right, then why are you doing it at all?"
You can probably do what you want more efficiently by doing it yourself instead of asking PROC SQL to guess how to do what you want. But in SAS there are limited tools for instructing PROC SQL how to process the query. So work on converting the query into actual SAS code.
As to what is a more efficient method, that depends on knowing more about the data. How large are the three datasets? Do the ID1 and ID2 variables uniquely identify the observations in the datasets?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.