SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

how to eliminate the cartesian result in joins

Reply
Frequent Contributor
Posts: 89

how to eliminate the cartesian result in joins

how to eliminate the cartesian result in joins if two datasets are having more than 1 record in 2 datasets (no chance to differentiate uniquely with the existed variables ).

For ex:

data a;

input pid age sex $;
cards;
1 21 m
1 22 f
1 23 m
2 34 f
2 33 f
;
data b;
input pid age Drug $;
cards;
1 21 Placebo
1 22 aceclo
2 34 Placebo
2 33 Diclofe
2 34 notavailable
;

kindly note that ,in the output Need to report all records without missing from either of the datasets.

Super User
Posts: 19,867

Re: how to eliminate the cartesian result in joins

Posted in reply to venkatnaveen

What would your expected output be for the above data?

Occasional Contributor
Posts: 14

Re: how to eliminate the cartesian result in joins

Posted in reply to venkatnaveen

A proc sql with joins on id and age should work as long as there are no null values in id and age.

PROC SQL;

   CREATE TABLE want AS

   SELECT t1.pid,

          t1.age,

          t1.sex,

          t2.Drug

      FROM A t1

           FULL JOIN B t2 ON (t1.pid = t2.pid) AND (t1.age = t2.age);

QUIT;

Respected Advisor
Posts: 4,932

Re: how to eliminate the cartesian result in joins

Posted in reply to venkatnaveen

To "report all records without missing from either of the datasets", you need a full join and the coalesce function on the joined fields:

data a;

input pid age Sex $;

cards;

1 21 m

1 22 f

1 23 m

2 34 f

2 33 f

;

data b;

length Drug $16;

input pid age Drug $;

cards;

1 21 Placebo

1 22 aceclo

2 34 Placebo

2 33 Diclofe

2 34 notavailable

3 27 newDrug

;

proc sql;

select

    coalesce(a.pid, b.pid) as Pid,

    coalesce(a.age, b.age) as Age,

    a.sex,

    b.drug

from

    a full join

    b on a.pid=b.pid and a.age=b.age;

quit;

In a case like above where the joined fields have the same names in both tables and all non-matching fields have distinct names, you can also use a natural full join and save some typing:

proc sql;

select *

from a natural full join b;

quit;

Note, I added a non-matching record in table b to show the full effect.

PG

PG
Ask a Question
Discussion stats
  • 3 replies
  • 308 views
  • 0 likes
  • 4 in conversation