Hi Experts,
Can anybody please help me to understand why I am getting warning message in the following proc sql?
Thanks in advance for your help.
Naeem
475 proc sql noprint;
476 create table all as
477 select a.*, s.* from
478 appended a, survey s
479 where a.id=s.id;
WARNING: Variable id already exists on file ALL
When you say a.*, s.* , you want all the variables from appended and all the variables from survey. But the variable id exists in both tables, and it cannot exist twice with the same name in table all. If id is the only variable in both datasets, you can try this instead :
proc sql;
create table all as select * from appended natural join survey;
quit;
PG
It's telling you that you have two variables that are the same between the two datasets. Considering you're joining on that field, you'd think it would be smart enough to know better.
It's preferable practice to specify the join type and use on rather than where, and also faster by about 5-10% in my tests. You'll still get the warning though, unless you explicitly list one set of variables.
proc sql noprint;
create table all as
select a.*, s.* from
appended a,
inner join survey s
on a.id=s.id;
When you say a.*, s.* , you want all the variables from appended and all the variables from survey. But the variable id exists in both tables, and it cannot exist twice with the same name in table all. If id is the only variable in both datasets, you can try this instead :
proc sql;
create table all as select * from appended natural join survey;
quit;
PG
What happens in a natural join if you have more than 1 matching variable name?
The join is made on all matching (name, type) variables.
Thanks PG!
It is working without any error message. As you suggested if there are more than one matching variables in two datasets then SAS will join 2 datasets based on all matching variables. Just for my understanding how natural join works for more than one matching variables?
Naeem
Say you have dataset T1 with variables A, B, C, X and dataset T2 with variables A, B, C, Y then the query
select * from T1 natural join T2;
is equivalent to
select T1.A, T1.B, T1,C, X, Y
from T1 inner join T2
on T1.A=T2.A and T1.B=T2.B and T1.C=T2.C;
PG
Thanks PG!
I got it. SAS is taking name and type of the variables to declare them matching. If matching variables contain duplicate or missing values how these will be handled by SAS.
Naeem
This is a vast subject. I recommend reading about all the available types of joins in SQL at :
SAS(R) 9.3 SQL Procedure User's Guide
and take some time to experiment on your own. The core concept (where you should start) is the cartesian product join.
PG
This is also a great read.
A database professional’s best friend - The SAS Training Post
Thanks all for the helpful suggestions on this topic.
Regards,
Naeem
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.