how to eliminate the cartesian result in joins if two datasets are having more than 1 record in 2 datasets (no chance to differentiate uniquely with the existed variables ).
For ex:
data a;
input pid age sex $;
cards;
1 21 m
1 22 f
1 23 m
2 34 f
2 33 f
;
data b;
input pid age Drug $;
cards;
1 21 Placebo
1 22 aceclo
2 34 Placebo
2 33 Diclofe
2 34 notavailable
;
kindly note that ,in the output Need to report all records without missing from either of the datasets.
What would your expected output be for the above data?
A proc sql with joins on id and age should work as long as there are no null values in id and age.
PROC SQL;
CREATE TABLE want AS
SELECT t1.pid,
t1.age,
t1.sex,
t2.Drug
FROM A t1
FULL JOIN B t2 ON (t1.pid = t2.pid) AND (t1.age = t2.age);
QUIT;
To "report all records without missing from either of the datasets", you need a full join and the coalesce function on the joined fields:
data a;
input pid age Sex $;
cards;
1 21 m
1 22 f
1 23 m
2 34 f
2 33 f
;
data b;
length Drug $16;
input pid age Drug $;
cards;
1 21 Placebo
1 22 aceclo
2 34 Placebo
2 33 Diclofe
2 34 notavailable
3 27 newDrug
;
proc sql;
select
coalesce(a.pid, b.pid) as Pid,
coalesce(a.age, b.age) as Age,
a.sex,
b.drug
from
a full join
b on a.pid=b.pid and a.age=b.age;
quit;
In a case like above where the joined fields have the same names in both tables and all non-matching fields have distinct names, you can also use a natural full join and save some typing:
proc sql;
select *
from a natural full join b;
quit;
Note, I added a non-matching record in table b to show the full effect.
PG
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.