BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
stat_sas
Ammonite | Level 13

Hi Experts,

Can anybody please help me to understand why I am getting warning message in the following proc sql?

Thanks in advance for your help.

Naeem

     475  proc sql noprint;

     476  create table all as

     477  select a.*, s.* from

     478  appended a, survey s

     479  where a.id=s.id;

     WARNING: Variable id already exists on file ALL

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

When you say a.*, s.* , you want all the variables from appended and all the variables from survey. But the variable id exists in both tables, and it cannot exist twice with the same name in table all. If id is the only variable in both datasets, you can try this instead :

proc sql;

create table all as select * from appended natural join survey;

quit;

PG

PG

View solution in original post

10 REPLIES 10
Reeza
Super User

It's telling you that you have two variables that are the same between the two datasets. Considering you're joining on that field, you'd think it would be smart enough to know better.

It's preferable practice to specify the join type and use on rather than where, and also faster by about 5-10% in my tests. You'll still get the warning though, unless you explicitly list one set of variables.

proc sql noprint;

create table all as

select a.*, s.* from

appended a,

inner join survey s

on a.id=s.id;

PGStats
Opal | Level 21

When you say a.*, s.* , you want all the variables from appended and all the variables from survey. But the variable id exists in both tables, and it cannot exist twice with the same name in table all. If id is the only variable in both datasets, you can try this instead :

proc sql;

create table all as select * from appended natural join survey;

quit;

PG

PG
Reeza
Super User

What happens in a natural join if you have more than 1 matching variable name?

PGStats
Opal | Level 21

The join is made on all matching (name, type) variables.

PG
stat_sas
Ammonite | Level 13

Thanks PG!

It is working without any error message. As you suggested if there are more than one matching variables in two datasets then SAS will join 2 datasets based on all matching variables. Just for my understanding how natural join works for more than one matching variables?

Naeem

PGStats
Opal | Level 21

Say you have dataset T1 with variables A, B, C, X and dataset T2 with variables A, B, C, Y then the query

select * from T1 natural join T2;

is equivalent to

select T1.A, T1.B, T1,C, X, Y

from T1 inner join T2

     on T1.A=T2.A and T1.B=T2.B and T1.C=T2.C;

PG

PG
stat_sas
Ammonite | Level 13

Thanks PG!

I got it. SAS is taking name and type of the variables to declare them matching. If matching variables contain duplicate or missing values how these will be handled by SAS.

Naeem

PGStats
Opal | Level 21

This is a vast subject. I recommend reading about all the available types of joins in SQL at :

SAS(R) 9.3 SQL Procedure User's Guide

and take some time to experiment on your own. The core concept (where you should start) is the cartesian product join.

PG

PG
stat_sas
Ammonite | Level 13

Thanks all for the helpful suggestions on this topic.

Regards,

Naeem

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 5092 views
  • 2 likes
  • 3 in conversation