Hi, I'm trying to use SQL to inner join 2 SAS files, here is what I have:
proc sql;
select Rank.NOC, Rank.Country, Gold_2018, Silver_2018, Bronze_2018
from ab.rank2018 inner join ab.rank2016
on rank2018.NOC=rank2016.NOC
on rank2018.Country=rank2016.Country;
quit;
My question is that because Country and NOC occur in both tables, I have the use the table prefix to indicate which column to use, which I choose olympics2016, however it doesn't work, it still gives me two table. Can someone please help me to fix it? Thank you.
Your join criteria doesn't look right. The second ON should be AND.
proc sql;
select Rank.NOC, Rank.Country, Gold_2018, Silver_2018, Bronze_2018
from ab.rank2018 inner join ab.rank2016
on rank2018.NOC=rank2016.NOC
and rank2018.Country=rank2016.Country;
quit;
Now it says: Unresolved reference to table/correlation name Rank.
I know it has to do something with the second ON but I'm not sure what is it.
When you use RANK.tablename, I suppose RANK is a defined library?
Also, where comes the 2018 and 2016 in the name of the library RANK?
They are sas7bdat files, here are my previous code if needed
proc print data=ab.rank2016;
proc print data=ab.rank2018;
ab is basically the path.
Maybe the "ab" part is missing ?
proc sql;
select Rank.NOC,
Rank.Country,
Gold_2018,
Silver_2018,
Bronze_2018
from ab.rank2018
inner join ab.rank2016 on
ab.rank2018.NOC=rank2016.NOC and
ab.rank2018.Country=rank2016.Country;
quit;
Also, does the rank2016.xxxx have also the "ab" part in their part?
Proper syntax would look like:
proc sql;
select
a.NOC,
a.Country,
b.Gold_2018,
b.Silver_2018,
b.Bronze_2018
from
ab.rank2016 as a inner join
ab.rank2018 as b
on a.NOC=b.NOC and a.Country=b.Country;
quit;
When SAS sees something like this in Proc SQL
on rank2018.NOC=rank2016.NOC
It expects Rank2018 and Rank2016 to be a single reference to a data set. But your datasets are in your library AB but the ON clause does not know that. So you provide an ALIAS n the From clause to have a single reference such as:
from ab.rank2018 as rank2018 inner join ab.rank2016 as rank2016
so references such as rank2018.NOC can get back to the actual data source for the NOC.
The AS is optional in assigning the alias but I think it helps readability.
Most people tend to use something shorter for the alias like T1 (for table 1) or A to avoid any extra typing when referencing the variables. You can see that typing A.Noc is quicker than Rank2018.Noc.
To reference two variables with the same names in the source the data sets you would use AS to rename one or both:
Select a.noc, b.noc as NOC2016
for example would have an output variable named NOC that came from the aliased set A and another named NOC2016 that was the value of the aliased set B Noc variable. (Yet another reason for short aliases)
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.