Desktop productivity for business analysts and programmers

Left Join with Enterprise Guide

Reply
Occasional Contributor
Posts: 9

Left Join with Enterprise Guide

I am trying to put together some output displaying all subjects in the database with their medications and adverse events side by side, e.g. Subj, Med, AE. I need to use a join to pull the data from the CM and AE tables so I used the SUBJID in a left join (thinking this is correct).

The problem is that in the output, I am getting a mashup of every med lnked with every AE for each subject, which appears to be a cartesian join and not what I was expecting in the output.

The code is below, any suggestions in correcting this? I've plyed around with various other joins that Enterprise Guide allows but all seem to come up with a massive cartesian output.

PROC SQL;
CREATE TABLE SASUSER.QUERY_FOR_CM_SAS7BDAT_0002 AS
SELECT t1.SUBJID,
t1.CMTRT,
t2.AETRT
FROM EC100045.cm AS t1 LEFT JOIN EC100047.ae AS t2 ON (t1.SUBJID = t2.SUBJID)
WHERE t1.SUBJID = '201-008' AND t1.CMTRT NOT = '' AND t2.AETRT NOT = ''
ORDER BY t1.SUBJID, t1.CMTRT;
QUIT;
Occasional Contributor
Posts: 9

Re: Left Join with Enterprise Guide

To further clarify, here are the two example tables and how I'd like to see the output. I now believe this should be an outer join but again I seem to keep getting cartesian output.

CM
subjid cmtrt
001 ASIPIRIN
001 ZOFRAN
002 DIGOXIN
002 CARDIZEM

AE
subjid aetrt
001 HEADACHE
001 NAUSEA
002 VOMITING
003 INDIGESTION

Desired output:
subjid cmtrt aetrt
001 ASPIRIN HEADACHE
001 ZOFRAN NAUSEA
002 DIGOXIN VOMITING
002 CARDIZEM (null)
003 (null) INDIGESTION
Super Contributor
Posts: 359

Re: Left Join with Enterprise Guide

What associates Asprin with Headache? You must have some ID to add to your join or you will get a cartesian join.
Occasional Contributor
Posts: 9

Re: Left Join with Enterprise Guide

So you're saying I need to add something else other than joining on the SUBJID?

It really doesn't matter to me how the AEs get linked to the meds, I just wanted to produce a list that displayed the results for each subject from both tables.
Super Contributor
Posts: 359

Re: Left Join with Enterprise Guide

In EG you would need to do :

data CM;
input subjid $ cmtrt $;
cards;
001 ASIPIRIN
001 ZOFRAN
002 DIGOXIN
002 CARDIZEM
run;

data AE;
input subjid $ aetrt $;
cards;
001 HEADACHE
001 NAUSEA
002 VOMITING
003 INDIGESTION
run;

proc sort data = cm;
by subjid;
run;

proc sort data = ae;
by subjid;
run;

data cm;
set cm;
by subjid;
if first.subjid then cnt = 0;
cnt + 1;
run;


data ae;
set ae;
by subjid;
if first.subjid then cnt = 0;
cnt + 1;
run;

to get an identifyer on each row.
Do an OUTER join to get the matching (on subjid and cnt)
create a calculated column for SubjID "coalescec(CM.subjid ,AE.subjid )"
Occasional Contributor
Posts: 9

Re: Left Join with Enterprise Guide

Thank you all for your help!
Trusted Advisor
Posts: 2,114

Re: Left Join with Enterprise Guide

PROC SQL did what you asked it to do. To get the dataset you want with that data, you are going to have to code it with a DATA step. To use PROC SQL, you will need to add a 'row' number in addition to the subject ID to use in the LEFT JOIN.
SAS Employee
Posts: 149

Re: Left Join with Enterprise Guide

As my father in law would say, PROC SQL in this case is doomed as doomed can be. You're talking about a many-to-many combination, which for PROC SQL always means an internal cartesian product.

As Doc suggested, you could do this with DATA step code although you'd have to really know what you're doing to come up with a way to get 002 CARDIZEM (null) instead of 002 CARDIZEM VOMITING.

One alternative if the data is small: you could go ahead with the Cartesian product, then on the back end use a sort data task with no duplicates based on the values of subjid cmtrt. Note this ability is only built into the sort data task in EG 4.2 -- in previous versions, you'd have to manually add the NODUPKEY option to the code yourself. This still doesn't get exactly what you want, in that you would see 002 CARDIZEM VOMITING instead of 002 CARDIZEM (null). And if your data is large and/or in a database, your system administrator / database administrator is going to kill me for even mentioning this. IT IS MASSIVELY INEFFICIENT!! You've been warned. Smiley Happy
Ask a Question
Discussion stats
  • 7 replies
  • 154 views
  • 0 likes
  • 4 in conversation