Hello,
This is probably basic question for many but I am struggling to understand why this is happening. I see that the output using CREATE TABLE and not using it are different. My understanding is create table is just to create a dataset, hence I am curious why I see these differences.
Sample datasets:
dataset three:
X A
1 a1
1 a2
2 b1
2 b2
4 d
dataset four:
X A
2 x1
2 x2
3 y
5 v
The output is as expected when I run
proc sql;
select *
from three full join four
on three.x = four.x;
quit;
However, it is different when I run the same with
proc sql;
create table five as
select *
from three full join four
on three.x = four.x;
quit;
With create table I do not see a repetitive X variable which should be the case with proc sql joins (ie. both x columns from both datasets should be seen). The other thing I noticed is the values 3 and 5 from dataset four are missing. Please help understanding this!!
Hi @Tommer
If you read the log associated with the second PROC SQL, you will see that SAS put a Warning. Indeed, it is not a problem to create a report with two columns with the same name, but it is completely different when you want to create a dataset because SAS cannot manage to have several columns with identical names.
The COLAESCE function is designed to handle such cases:
create table five as
select coalesce (three.x, four.x) as x, coalesce (three.a, four.a) as a
-> it will retrieve all values coming from each x variable in the same column.
Best,
When you are not explicitly creating a table, SAS/SQL chooses to create the new column names silently since all you are going to see anyway are column labels (which do not have to be unique). But when you explicitly create a table, SAS chooses not to assign column names (and generate an error) because, I suppose, you might not get what you expected. You can capture the underlying table created when you don't explicitly create a table with ODS OUTPUT, like this:
data three;
input X A $;
datalines;
1 a1
1 a2
2 b1
2 b2
4 d
;
data four;
input X A $;
datalines;
2 x1
2 x2
3 y
5 v
;
ods output Sql_Results=six;
proc sql;
select *
from three full join four
on three.x = four.x;
describe table six;
quit;
Look at the log.
To avoid duplicate column names, give some of them new names :
proc sql;
create table five as
select three.*, four.x as xx, four.a as aa
from three full join four
on three.x = four.x;
quit;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.