Hi,
I would like to merge more than two datasets using PROC SQL. I want to merge them by ID (all datasets have this variable in common). These datasets (around 10) all have different number of columns. I've only found a way to join two datasets (example below).
PROC SQL;
SELECT A.*, B.*
FROM STATES AS A, CITYS AS B
WHERE A.ID=B.ID;
@SanKH1 wrote:
Hi,
I would like to merge more than two datasets using PROC SQL. I want to merge them by ID (all datasets have this variable in common). These datasets (around 10) all have different number of columns. I've only found a way to join two datasets (example below).
PROC SQL;
SELECT A.*, B.*
FROM STATES AS A, CITYS AS B
WHERE A.ID=B.ID;
Why PROC SQL?
If you want to merge multiple data sets it is much easier in SAS syntax instead.
data want;
merge one two three ;
by id;
run;
If you want to "join" in SQL then probably should be explicit about the type of join you want to do.
So assuming you only want that observations that have data in all three dataset then use INNER join.
proc sql;
create table want as
select *
from one a
inner join two b
on a.id = b.id
inner join three c
on a.id = c.id
;
quit;
But you probably need to also be careful about which variables you select. Using the * shortcut to select all variables will generate notes that ID already exists in the dataset since it will include A.ID and B.ID and C.ID. Since the dataset WANT can only have one variable named ID the first one will be the values that are kept. With an INNER join it does not matter since you are only selecting the joins where the values of the ID variable are the same.
@SanKH1 wrote:
Hi,
I would like to merge more than two datasets using PROC SQL. I want to merge them by ID (all datasets have this variable in common). These datasets (around 10) all have different number of columns. I've only found a way to join two datasets (example below).
PROC SQL;
SELECT A.*, B.*
FROM STATES AS A, CITYS AS B
WHERE A.ID=B.ID;
Why PROC SQL?
If you want to merge multiple data sets it is much easier in SAS syntax instead.
data want;
merge one two three ;
by id;
run;
If you want to "join" in SQL then probably should be explicit about the type of join you want to do.
So assuming you only want that observations that have data in all three dataset then use INNER join.
proc sql;
create table want as
select *
from one a
inner join two b
on a.id = b.id
inner join three c
on a.id = c.id
;
quit;
But you probably need to also be careful about which variables you select. Using the * shortcut to select all variables will generate notes that ID already exists in the dataset since it will include A.ID and B.ID and C.ID. Since the dataset WANT can only have one variable named ID the first one will be the values that are kept. With an INNER join it does not matter since you are only selecting the joins where the values of the ID variable are the same.
Thank you! I used the first solution you mentioned.
Hi,
There is a link with a similar questions, the examples shown here, looks like what you want,
Hope it will be useful
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.