I have a ten dataset with columns firmid and size. I want to full join them via firmid. I wrote the following code, but give me undesirable results. Please see below.
dataset 1
firmid size
1 100
2 200
3 250
dataset 2
firmid size
2 210
4 150
I want to get
desired
firmid size1 size2
1 100 .
2 200 210
3 250 .
4 . 150
But with the following codes
proc sql;
create table desired as
select *
from dataset1(rename=(size=size1)) full join dataset2(rename=(size=size2))
on dataset1.firmid = dataset2.firmid;
quit;
I got
firmid size1 size2
1 100 .
2 200 210
3 250 .
. . 150
Please help.
SAS SQL has a hard time when many columns have the same name in the column list. Be more explicit as to what columns you want :
proc sql;
create table desired as
select
coalesce(d1.firmId, d2.firmId) as firmId,
d1.size as size1,
d2.size as size2
from
dataset1 as d1 full join
dataset2 as d2 on d1.firmid = d2.firmid;
quit;
PG
Thank you for your help. This works when I only have two dataset. Now the problem would be when I have more than two columns to merge and I use a loop to emerge them. When I merge dataset1 and dataset2 I got correct dataset with firmId that combines those of dataset1 and dataset2. When I tried to merge dataset with dataset3, the column firmId was replaced and there are still missing firmId. How to solve this?
No need for a loop. Here is how to generalize for three datasets.
data dataset1;
input
firmid size;
datalines;
1 100
2 200
3 250
;
data dataset2;
input
firmid size;
datalines;
2 210
4 150
;
data dataset3;
input
firmid size;
datalines;
2 220
5 155
;
proc sql;
create table desired as
select
coalesce(d1.firmId, d2.firmId, d3.firmId) as firmId,
d1.size as size1,
d2.size as size2,
d3.size as size3
from
dataset1 as d1 full join
dataset2 as d2 on d1.firmid = d2.firmid full join
dataset3 as d3 on d1.firmid = d3.firmid;
quit;
or you could also use a datastep and proc transpose :
data desiredList;
set dataset1 dataset2 dataset3 indsname=dsname;
vname = cats('size', compress(dsname,,"dk"));
run;
proc sort data=desiredList; by firmId vname; run;
proc transpose data=desiredList out=desiredTable(drop=_name_);
by firmId;
var size;
id vname;
run;
PG
Or, if your datasets are not really named dataset1, dataset2 etc., you could use :
data desiredList(drop=count);
set dataset1 dataset2 dataset3 indsname=dsname;
if dsname ne lag(dsname) then count+1;
vname = cats('size', count);
run;
proc sort data=desiredList; by firmId vname; run;
proc transpose data=desiredList out=desiredTable(drop=_name_);
by firmId;
var size;
id vname;
run;
PG
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.