DATA Step, Macro, Functions and more

Full join in sql and keep the same joining variable

Reply
Frequent Contributor
Posts: 122

Full join in sql and keep the same joining variable

I have a ten dataset with columns firmid and size. I want to full join them via firmid. I wrote the following code, but give me undesirable results. Please see below.

dataset 1

firmid          size

1                100

2                200

3                250

dataset 2

firmid          size

2                210

4                150

I want to get

desired

firmid          size1          size2

1                100               .

2                200               210

3                250               .

4                .                    150

But with the following codes

proc sql;

     create table desired as

     select *

     from dataset1(rename=(size=size1)) full join dataset2(rename=(size=size2))

     on dataset1.firmid = dataset2.firmid;

quit;

I got

firmid          size1          size2

1                100               .

2                200               210

3                250               .

.                .                    150

Please help.

Respected Advisor
Posts: 4,646

Re: Full join in sql and keep the same joining variable

SAS SQL has a hard time when many columns have the same name in the column list. Be more explicit as to what columns you want :

proc sql;

create table desired as

select

     coalesce(d1.firmId, d2.firmId) as firmId,

     d1.size as size1,

     d2.size as size2

from

     dataset1 as d1 full join

     dataset2 as d2 on d1.firmid = d2.firmid;

quit;

PG

PG
Frequent Contributor
Posts: 122

Re: Full join in sql and keep the same joining variable

Thank you for your help. This works when I only have two dataset. Now the problem would be when I have more than two columns to merge and I use a loop to emerge them. When I merge dataset1 and dataset2 I got correct dataset with firmId that combines those of dataset1 and dataset2. When I tried to merge dataset with dataset3, the column firmId was replaced and there are still missing firmId. How to solve this?

Respected Advisor
Posts: 4,646

Re: Full join in sql and keep the same joining variable

No need for a loop. Here is how to generalize for three datasets.

data dataset1;

input

firmid          size;

datalines;

1                100

2                200

3                250

;

data dataset2;

input

firmid          size;

datalines;

2                210

4                150

;

data dataset3;

input

firmid          size;

datalines;

2                220

5                155

;

proc sql;

create table desired as

select

  coalesce(d1.firmId, d2.firmId, d3.firmId) as firmId,

  d1.size as size1,

  d2.size as size2,

  d3.size as size3

from

  dataset1 as d1 full join

  dataset2 as d2 on d1.firmid = d2.firmid full join

  dataset3 as d3 on d1.firmid = d3.firmid;

quit;

or you could also use a datastep and proc transpose :

data desiredList;

set dataset1 dataset2 dataset3 indsname=dsname;

vname = cats('size', compress(dsname,,"dk"));

run;

proc sort data=desiredList; by firmId vname; run;

proc transpose data=desiredList out=desiredTable(drop=_name_);

by firmId;

var size;

id vname;

run;

PG

PG
Respected Advisor
Posts: 4,646

Re: Full join in sql and keep the same joining variable

Or, if your datasets are not really named dataset1, dataset2 etc., you could use :

data desiredList(drop=count);

set dataset1 dataset2 dataset3 indsname=dsname;

if dsname ne lag(dsname) then count+1;

vname = cats('size', count);

run;


proc sort data=desiredList; by firmId vname; run;


proc transpose data=desiredList out=desiredTable(drop=_name_);

by firmId;

var size;

id vname;

run;

PG

PG
Ask a Question
Discussion stats
  • 4 replies
  • 1367 views
  • 0 likes
  • 2 in conversation