Hi,
I thought these two codes would return same results, however, the results were different with the duplicated records in the second query ( the original query data are all one patient having one record) .
I am wondering the reason. Thank you.
proc sql;
create table cohort as
select a.*
from lot2_2 as a
where a.patientid in
( select b.patientid
from &dblib..Lineoftherapy(where= (LineNumber = 1)) as b , &dblib..Demographics as c
where b.PatientID = c.PatientID and (year(b.StartDate) - c.BirthYear) >= 18 );
quit;
proc sql;
create table cohort as
select a.*
from lot2_2 as a , &dblib..Lineoftherapy(where= (LineNumber = 1)) as b , &dblib..Demographics as c
where ( a.patientid = b.PatientID = c.PatientID and (year(b.StartDate) - c.BirthYear) >= 18 );
quit;
The first query is a subquery, and the result is Boolean (true or false).
The second is a join, the result is not restricted to true/false on the left table.
The condition a.patientid = b.PatientID = c.PatientID isn't equivalent to (a.patientid = b.PatientID) and (b.PatientID = c.PatientID). It evaluates as (a.patientid = b.PatientID) = c.PatientID. Now, a.patientid = b.PatientID elaluates as TRUE or FALSE (1 or 0) which is then compared to c.PatientID. Unless c.PatientID is 0 or 1, the condition will be FALSE.
Replace a.patientid = b.PatientID = c.PatientID with (a.patientid = b.PatientID) and (b.PatientID = c.PatientID) and both queries should yield the same result.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.