I am trying to merge two dataset with multiple conditions, in proc sql but when both conditions fail it is giving me a cartisian product of the first variables in dataset with all other variables in second dataset. if any of the below two conditions doesnt match then i dont want to have cartesian product for that. i am getting a cartesian product with all the non matching record in first dataset getting merged with all the records for that subject in second dataset. I dont want that to happen is there any better to merge.
proc sql;
create table pk_st_atr1 as
select st_atrx.*,pk_1x.adt,atm,pkstresc,pkstresu, case
when adt^= . then 'Y' end as pk_tr_fl
from st_atrx as X full join pk_1x as Y
on x.logdt ^= . and y.adt ^= . and x.logtm ^= . and y.atm ^= . and
(x.subject=y.subject and x.visitnum=y.visitnum and x.pktpt_std= y.pktpt_std and x.accession_number= y.accession_number and x.sample_id= y.sample_id)
or (x.subject=y.subject and x.visitnum=y.visitnum and x.accession_number= y.accession_number and x.sample_id= y.sample_id and x.logdt=y.adt and x.logtm=y.atm )
;
quit;
run;
Your code is a mess.
Write legible code and the issue with your OR clause will become obvious.
You might something like:
proc sql;
create table PK_ST_ATR1 as
select ST_ATRX.*
,PK_1X.ADT
,ATM
,PKSTRESC
,PKSTRESU
,case when ADT^= . then 'Y' end as PK_TR_FL
from ST_ATRX as X
full join
PK_1X as Y
on x.LOGDT ^= .
and y.ADT ^= .
and x.LOGTM ^= .
and y.ATM ^= .
and x.SUBJECT = y.SUBJECT
and x.VISITNUM = y.VISITNUM
and x.ACCESSION_NUMBER= y.ACCESSION_NUMBER
and x.SAMPLE_ID = y.SAMPLE_ID
and ( (x.PKTPT_STD = y.PKTPT_STD )
or(x.LOGDT=y.ADT and x.LOGTM=y.ATM ) ) ;
Code should be pretty. This will make your life much easier.
Also, verify if:
1. Do you really want a full join?
2. Should the test on missing values be in a WHERE clause?
Your code is a mess.
Write legible code and the issue with your OR clause will become obvious.
You might something like:
proc sql;
create table PK_ST_ATR1 as
select ST_ATRX.*
,PK_1X.ADT
,ATM
,PKSTRESC
,PKSTRESU
,case when ADT^= . then 'Y' end as PK_TR_FL
from ST_ATRX as X
full join
PK_1X as Y
on x.LOGDT ^= .
and y.ADT ^= .
and x.LOGTM ^= .
and y.ATM ^= .
and x.SUBJECT = y.SUBJECT
and x.VISITNUM = y.VISITNUM
and x.ACCESSION_NUMBER= y.ACCESSION_NUMBER
and x.SAMPLE_ID = y.SAMPLE_ID
and ( (x.PKTPT_STD = y.PKTPT_STD )
or(x.LOGDT=y.ADT and x.LOGTM=y.ATM ) ) ;
Code should be pretty. This will make your life much easier.
Also, verify if:
1. Do you really want a full join?
2. Should the test on missing values be in a WHERE clause?
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.