I have a ten dataset with columns firmid and size. I want to full join them via firmid. I wrote the following code, but give me undesirable results. Please see below.
dataset 1
firmid size
1 100
2 200
3 250
dataset 2
firmid size
2 210
4 150
I want to get
desired
firmid size1 size2
1 100 .
2 200 210
3 250 .
4 . 150
But with the following codes
proc sql;
create table desired as
select *
from dataset1(rename=(size=size1)) full join dataset2(rename=(size=size2))
on dataset1.firmid = dataset2.firmid;
quit;
I got
firmid size1 size2
1 100 .
2 200 210
3 250 .
. . 150
Please help.
SAS SQL has a hard time when many columns have the same name in the column list. Be more explicit as to what columns you want :
proc sql;
create table desired as
select
coalesce(d1.firmId, d2.firmId) as firmId,
d1.size as size1,
d2.size as size2
from
dataset1 as d1 full join
dataset2 as d2 on d1.firmid = d2.firmid;
quit;
PG
Thank you for your help. This works when I only have two dataset. Now the problem would be when I have more than two columns to merge and I use a loop to emerge them. When I merge dataset1 and dataset2 I got correct dataset with firmId that combines those of dataset1 and dataset2. When I tried to merge dataset with dataset3, the column firmId was replaced and there are still missing firmId. How to solve this?
No need for a loop. Here is how to generalize for three datasets.
data dataset1;
input
firmid size;
datalines;
1 100
2 200
3 250
;
data dataset2;
input
firmid size;
datalines;
2 210
4 150
;
data dataset3;
input
firmid size;
datalines;
2 220
5 155
;
proc sql;
create table desired as
select
coalesce(d1.firmId, d2.firmId, d3.firmId) as firmId,
d1.size as size1,
d2.size as size2,
d3.size as size3
from
dataset1 as d1 full join
dataset2 as d2 on d1.firmid = d2.firmid full join
dataset3 as d3 on d1.firmid = d3.firmid;
quit;
or you could also use a datastep and proc transpose :
data desiredList;
set dataset1 dataset2 dataset3 indsname=dsname;
vname = cats('size', compress(dsname,,"dk"));
run;
proc sort data=desiredList; by firmId vname; run;
proc transpose data=desiredList out=desiredTable(drop=_name_);
by firmId;
var size;
id vname;
run;
PG
Or, if your datasets are not really named dataset1, dataset2 etc., you could use :
data desiredList(drop=count);
set dataset1 dataset2 dataset3 indsname=dsname;
if dsname ne lag(dsname) then count+1;
vname = cats('size', count);
run;
proc sort data=desiredList; by firmId vname; run;
proc transpose data=desiredList out=desiredTable(drop=_name_);
by firmId;
var size;
id vname;
run;
PG
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.