- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I would like to merge more than two datasets using PROC SQL. I want to merge them by ID (all datasets have this variable in common). These datasets (around 10) all have different number of columns. I've only found a way to join two datasets (example below).
PROC SQL;
SELECT A.*, B.*
FROM STATES AS A, CITYS AS B
WHERE A.ID=B.ID;
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@ANKH1 wrote:
Hi,
I would like to merge more than two datasets using PROC SQL. I want to merge them by ID (all datasets have this variable in common). These datasets (around 10) all have different number of columns. I've only found a way to join two datasets (example below).
PROC SQL;
SELECT A.*, B.*
FROM STATES AS A, CITYS AS B
WHERE A.ID=B.ID;
Why PROC SQL?
If you want to merge multiple data sets it is much easier in SAS syntax instead.
data want;
merge one two three ;
by id;
run;
If you want to "join" in SQL then probably should be explicit about the type of join you want to do.
So assuming you only want that observations that have data in all three dataset then use INNER join.
proc sql;
create table want as
select *
from one a
inner join two b
on a.id = b.id
inner join three c
on a.id = c.id
;
quit;
But you probably need to also be careful about which variables you select. Using the * shortcut to select all variables will generate notes that ID already exists in the dataset since it will include A.ID and B.ID and C.ID. Since the dataset WANT can only have one variable named ID the first one will be the values that are kept. With an INNER join it does not matter since you are only selecting the joins where the values of the ID variable are the same.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@ANKH1 wrote:
Hi,
I would like to merge more than two datasets using PROC SQL. I want to merge them by ID (all datasets have this variable in common). These datasets (around 10) all have different number of columns. I've only found a way to join two datasets (example below).
PROC SQL;
SELECT A.*, B.*
FROM STATES AS A, CITYS AS B
WHERE A.ID=B.ID;
Why PROC SQL?
If you want to merge multiple data sets it is much easier in SAS syntax instead.
data want;
merge one two three ;
by id;
run;
If you want to "join" in SQL then probably should be explicit about the type of join you want to do.
So assuming you only want that observations that have data in all three dataset then use INNER join.
proc sql;
create table want as
select *
from one a
inner join two b
on a.id = b.id
inner join three c
on a.id = c.id
;
quit;
But you probably need to also be careful about which variables you select. Using the * shortcut to select all variables will generate notes that ID already exists in the dataset since it will include A.ID and B.ID and C.ID. Since the dataset WANT can only have one variable named ID the first one will be the values that are kept. With an INNER join it does not matter since you are only selecting the joins where the values of the ID variable are the same.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you! I used the first solution you mentioned.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
create table want as
select *
from
one
natural join
two
natural join
three
;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
There is a link with a similar questions, the examples shown here, looks like what you want,
Hope it will be useful
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content