DATA Step, Macro, Functions and more

Two queries

Reply
Frequent Contributor
Frequent Contributor
Posts: 78

Two queries

Hi, 

 

I thought these two codes would return same results, however, the results were different with the duplicated records in the second query ( the original query data are all one patient having one record) .  

I am wondering the reason.  Thank you.

 

proc sql;
create table cohort as
select a.*
from lot2_2 as a
where a.patientid in
( select b.patientid
from &dblib..Lineoftherapy(where= (LineNumber = 1)) as b , &dblib..Demographics as c
where b.PatientID = c.PatientID and (year(b.StartDate) - c.BirthYear) >= 18 );
quit;


proc sql;
create table cohort as
select a.*
from lot2_2 as a , &dblib..Lineoftherapy(where= (LineNumber = 1)) as b , &dblib..Demographics as c
where ( a.patientid = b.PatientID = c.PatientID and (year(b.StartDate) - c.BirthYear) >= 18 );
quit;

Super User
Posts: 5,429

Re: Two queries

The first query is a subquery, and the result is Boolean (true or false).

 

The second is a join, the result is not restricted to true/false on the left table.

Data never sleeps
Frequent Contributor
Frequent Contributor
Posts: 78

Re: Two queries

Thank you, LinusH !
Respected Advisor
Posts: 4,923

Re: Two queries

The condition a.patientid = b.PatientID = c.PatientID isn't equivalent to (a.patientid = b.PatientID) and (b.PatientID = c.PatientID). It evaluates as (a.patientid = b.PatientID) = c.PatientID. Now, a.patientid = b.PatientID elaluates as TRUE or FALSE (1 or 0) which is then compared to c.PatientID. Unless c.PatientID is 0 or 1, the condition will be FALSE. 

 

Replace a.patientid = b.PatientID = c.PatientID with (a.patientid = b.PatientID) and (b.PatientID = c.PatientID) and both queries should yield the same result. 

PG
Ask a Question
Discussion stats
  • 3 replies
  • 212 views
  • 0 likes
  • 3 in conversation