BookmarkSubscribeRSS Feed
Tommer
Obsidian | Level 7

Hello,

 

This is probably basic question for many but I am struggling to understand why this is happening. I see that the output using CREATE TABLE and not using it are different. My understanding is create table is just to create a dataset, hence I am curious why I see these differences.

 

Sample datasets: 

dataset three:

X A

1 a1

1 a2

2 b1

2 b2

4 d

 

dataset four:

X A

2 x1

2 x2

3 y

5 v

 

The output is as expected when I run

proc sql;
select *
from three full join four
on three.x = four.x;
quit;

 

However, it is different when I run the same with

proc sql;
 create table five as 
select *
from three full join four
on three.x = four.x;
quit;

 

With create table I do not see a repetitive X variable which should be the case with proc sql joins (ie. both x columns from both datasets should be seen). The other thing I noticed is the values 3 and 5 from dataset four are missing. Please help understanding this!!

4 REPLIES 4
ed_sas_member
Meteorite | Level 14

Hi @Tommer 

If you read the log associated with the second PROC SQL, you will see that SAS put a Warning. Indeed, it is not a problem to create a report with two columns with the same name, but it is completely different when you want to create a dataset because SAS cannot manage to have several columns with identical names.


The COLAESCE function is designed to handle such cases:

create table five as 
select coalesce (three.x, four.x) as x, coalesce (three.a, four.a) as a

-> it will retrieve all values coming from each x variable in the same column.

 

Best,

Tommer
Obsidian | Level 7
Understood, so that's where the difference comes in when I am creating a table vs not creating one. When I don't use CREATE TABLE the output was as I thought it would be with additional columns etc, but once I used CREATE TABLE the output seemed very different as you explained creating dataset with multiple similar names.
PGStats
Opal | Level 21

When you are not explicitly creating a table, SAS/SQL chooses to create the new column names silently since all you are going to see anyway are column labels (which do not have to be unique). But when you explicitly create a table, SAS chooses not to assign column names (and generate an error) because, I suppose, you might not get what you expected. You can capture the underlying table created when you don't explicitly create a table with ODS OUTPUT, like this:

 

data three;
input X A $;
datalines;
1 a1
1 a2
2 b1
2 b2
4 d
;
 
data four;
input X A $;
datalines;
2 x1
2 x2
3 y
5 v
;
 
ods output Sql_Results=six;
proc sql;
select *
from three full join four
on three.x = four.x;
describe table six;
quit;

Look at the log.

PG
PGStats
Opal | Level 21

To avoid duplicate column names, give some of them new names :

 

proc sql;
create table five as 
select three.*, four.x as xx, four.a as aa
from three full join four
on three.x = four.x;
quit;
PG

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1097 views
  • 1 like
  • 3 in conversation