Hi,
I thought these two codes would return same results, however, the results were different with the duplicated records in the second query ( the original query data are all one patient having one record) .
I am wondering the reason. Thank you.
proc sql;
create table cohort as
select a.*
from lot2_2 as a
where a.patientid in
( select b.patientid
from &dblib..Lineoftherapy(where= (LineNumber = 1)) as b , &dblib..Demographics as c
where b.PatientID = c.PatientID and (year(b.StartDate) - c.BirthYear) >= 18 );
quit;
proc sql;
create table cohort as
select a.*
from lot2_2 as a , &dblib..Lineoftherapy(where= (LineNumber = 1)) as b , &dblib..Demographics as c
where ( a.patientid = b.PatientID = c.PatientID and (year(b.StartDate) - c.BirthYear) >= 18 );
quit;
The first query is a subquery, and the result is Boolean (true or false).
The second is a join, the result is not restricted to true/false on the left table.
The condition a.patientid = b.PatientID = c.PatientID isn't equivalent to (a.patientid = b.PatientID) and (b.PatientID = c.PatientID). It evaluates as (a.patientid = b.PatientID) = c.PatientID. Now, a.patientid = b.PatientID elaluates as TRUE or FALSE (1 or 0) which is then compared to c.PatientID. Unless c.PatientID is 0 or 1, the condition will be FALSE.
Replace a.patientid = b.PatientID = c.PatientID with (a.patientid = b.PatientID) and (b.PatientID = c.PatientID) and both queries should yield the same result.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.