BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
SanKH1
Obsidian | Level 7

Hi, 

I would like to merge more than two datasets using PROC SQL. I want to merge them by ID (all datasets have this variable in common). These datasets (around 10) all have different number of columns. I've only found a way to join two datasets (example below).

PROC SQL;
SELECT A.*,  B.*
FROM STATES AS A, CITYS AS B
WHERE A.ID=B.ID;

 

0 Likes
1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

@SanKH1 wrote:

Hi, 

I would like to merge more than two datasets using PROC SQL. I want to merge them by ID (all datasets have this variable in common). These datasets (around 10) all have different number of columns. I've only found a way to join two datasets (example below).

PROC SQL;
SELECT A.*,  B.*
FROM STATES AS A, CITYS AS B
WHERE A.ID=B.ID;

 


Why PROC SQL?

If you want to merge multiple data sets it is much easier in SAS syntax instead.

data want;
  merge one two three ;
  by id;
run;

If you want to "join" in SQL then probably should be explicit about the type of join you want to do. 

So assuming you only want that observations that have data in all three dataset then use INNER join.

 

proc sql;
create table want as
select *
from one a 
inner join two b 
  on a.id = b.id
inner join three c
  on a.id = c.id
;
quit;

But you probably need to also be careful about which variables you select.  Using the * shortcut to select all variables will generate notes that ID already exists in the dataset since it will include A.ID and B.ID and C.ID.  Since the dataset WANT can only have one variable named ID the first one will be the values that are kept.  With an INNER join it does not matter since you are only selecting the joins where the values of the ID variable are the same.

View solution in original post

0 Likes
3 REPLIES 3
Tom
Super User Tom
Super User

@SanKH1 wrote:

Hi, 

I would like to merge more than two datasets using PROC SQL. I want to merge them by ID (all datasets have this variable in common). These datasets (around 10) all have different number of columns. I've only found a way to join two datasets (example below).

PROC SQL;
SELECT A.*,  B.*
FROM STATES AS A, CITYS AS B
WHERE A.ID=B.ID;

 


Why PROC SQL?

If you want to merge multiple data sets it is much easier in SAS syntax instead.

data want;
  merge one two three ;
  by id;
run;

If you want to "join" in SQL then probably should be explicit about the type of join you want to do. 

So assuming you only want that observations that have data in all three dataset then use INNER join.

 

proc sql;
create table want as
select *
from one a 
inner join two b 
  on a.id = b.id
inner join three c
  on a.id = c.id
;
quit;

But you probably need to also be careful about which variables you select.  Using the * shortcut to select all variables will generate notes that ID already exists in the dataset since it will include A.ID and B.ID and C.ID.  Since the dataset WANT can only have one variable named ID the first one will be the values that are kept.  With an INNER join it does not matter since you are only selecting the joins where the values of the ID variable are the same.

0 Likes
SanKH1
Obsidian | Level 7

Thank you! I used the first solution you mentioned.

0 Likes
Ksharp
Diamond | Level 26
proc sql;
create table want as
select *
from
one
natural join
two
natural join
three
;
quit;
JUST RELEASED

SAS Viya with pay-as-you-go pricing.

Deploy software automatically at the click of a button on the Microsoft Azure Marketplace.

Learn more

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Recommended by SAS
These recommendations are generated using AI from SAS. For personalized recommendations, sign in with your SAS profile.
Discussion stats
  • 3 replies
  • 127 views
  • 1 like
  • 3 in conversation