Hi,
I thought these two codes would return same results, however, the results were different with the duplicated records in the second query ( the original query data are all one patient having one record) .
I am wondering the reason. Thank you.
proc sql;
create table cohort as
select a.*
from lot2_2 as a
where a.patientid in
( select b.patientid
from &dblib..Lineoftherapy(where= (LineNumber = 1)) as b , &dblib..Demographics as c
where b.PatientID = c.PatientID and (year(b.StartDate) - c.BirthYear) >= 18 );
quit;
proc sql;
create table cohort as
select a.*
from lot2_2 as a , &dblib..Lineoftherapy(where= (LineNumber = 1)) as b , &dblib..Demographics as c
where ( a.patientid = b.PatientID = c.PatientID and (year(b.StartDate) - c.BirthYear) >= 18 );
quit;
The first query is a subquery, and the result is Boolean (true or false).
The second is a join, the result is not restricted to true/false on the left table.
The condition a.patientid = b.PatientID = c.PatientID isn't equivalent to (a.patientid = b.PatientID) and (b.PatientID = c.PatientID). It evaluates as (a.patientid = b.PatientID) = c.PatientID. Now, a.patientid = b.PatientID elaluates as TRUE or FALSE (1 or 0) which is then compared to c.PatientID. Unless c.PatientID is 0 or 1, the condition will be FALSE.
Replace a.patientid = b.PatientID = c.PatientID with (a.patientid = b.PatientID) and (b.PatientID = c.PatientID) and both queries should yield the same result.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.