Hi,
I thought these two codes would return same results, however, the results were different with the duplicated records in the second query ( the original query data are all one patient having one record) .
I am wondering the reason. Thank you.
proc sql;
create table cohort as
select a.*
from lot2_2 as a
where a.patientid in
( select b.patientid
from &dblib..Lineoftherapy(where= (LineNumber = 1)) as b , &dblib..Demographics as c
where b.PatientID = c.PatientID and (year(b.StartDate) - c.BirthYear) >= 18 );
quit;
proc sql;
create table cohort as
select a.*
from lot2_2 as a , &dblib..Lineoftherapy(where= (LineNumber = 1)) as b , &dblib..Demographics as c
where ( a.patientid = b.PatientID = c.PatientID and (year(b.StartDate) - c.BirthYear) >= 18 );
quit;
The first query is a subquery, and the result is Boolean (true or false).
The second is a join, the result is not restricted to true/false on the left table.
The condition a.patientid = b.PatientID = c.PatientID isn't equivalent to (a.patientid = b.PatientID) and (b.PatientID = c.PatientID). It evaluates as (a.patientid = b.PatientID) = c.PatientID. Now, a.patientid = b.PatientID elaluates as TRUE or FALSE (1 or 0) which is then compared to c.PatientID. Unless c.PatientID is 0 or 1, the condition will be FALSE.
Replace a.patientid = b.PatientID = c.PatientID with (a.patientid = b.PatientID) and (b.PatientID = c.PatientID) and both queries should yield the same result.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.