I am merging two tables together by date ranges, so I have to use SQL to join them. However, I want to use the IN= data set option to create additional flags in the data. Is there an SQL equivalent to this method?
Below are two snippets of code. The first one is what I am doing where I merge the 2 datasets by a date range. But the second one is the concept I would like to incorporate into the SQL query. I lose that in=a/in=b information once I leave SQL.
proc sql;
create table overlap as
select a.*, b.aestdt, b.aeendt
from cfuhosp as a left join ae as b
on a.subject=b.subject
where b.aestdt<=a.uhstdt<=b.aeendt
order by subject;
quit;
data overlap;
merge cfuhosp (in=a) ae (in=b);
by ...; /* can't be done in data step */
if a;
if b=1 then discrep="Discrepancy Note #1";
if b=0 then discrep="Discrepancy Note #2";
run;
No. But you could test if one of the variables from the other table has a missing value. Make sure to pick a variable where a missing value is not an expected value. Perhaps one of the key variables.
proc sql;
create table overlap as
select a.*
, b.aestdt, b.aeendt
, case when (missing(b.subject)) then "Discrepancy Note #2"
else "Discrepancy Note #1"
end as discrep
from cfuhosp as a left join ae as b
on a.subject=b.subject
and b.aestdt<=a.uhstdt<=b.aeendt
order by subject
;
quit;
When you do a LEFT join and the join does not find a match, then the variables from the "right" dataset will be missing. You can check for that.
I don't think I have an issue with the join. I just want to flag which part of the attached Venn diagram the observations fall in. My SQL join is keeping data that fall in Area1 and Area3. I want to create a flag that if the data is in Area1, then output one message; if the data is in Area3, then output a different message.
See this:
proc sql;
create table overlap as
select
a.*,
b.aestdt,
b.aeendt,
case
when b.subject is missing
then "Discrepancy Note #2"
else "Discrepancy Note #1"
end as discrep
from cfuhosp as a left join ae as b
on a.subject=b.subject
where b.aestdt<=a.uhstdt<=b.aeendt
order by subject
;
quit;
This is a re-engineering of your data step code in SQL.
Kurt,
I believe your method will work, but I just realized a flaw in my logic. All subjects in my CFUHOSP table are also in AE, so when using the case statement, it didn't give me 2 different notes. To test, I went back to my AE dataset and deleted a subject. Now that subject doesn't appear at all. But this is because of my WHERE statement. Since there are no AE start/end dates, the where condition is not satisfied for that subject to be retained. I'll have to rethink how to handle these cases (because these are the ones that I want to retain). But thank you for your help. I think this will be a good place to start.
What @Tom posted, you need to move the date condition into the ON clause. Should have seen that on my own.
No. But you could test if one of the variables from the other table has a missing value. Make sure to pick a variable where a missing value is not an expected value. Perhaps one of the key variables.
proc sql;
create table overlap as
select a.*
, b.aestdt, b.aeendt
, case when (missing(b.subject)) then "Discrepancy Note #2"
else "Discrepancy Note #1"
end as discrep
from cfuhosp as a left join ae as b
on a.subject=b.subject
and b.aestdt<=a.uhstdt<=b.aeendt
order by subject
;
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.