Greeting all,
I need some help with checking if a transaction date falls between a begin_date and end_date column and flagging the record
The transaction table for simplicity contain 2 columns an id and transaction date with potential duplicates while a second table (no duplicate rows) contain the following information, the begin_date and end_date but can be repeated for different groups as shown below (I only shown 2 groups here -approx 50 groups in total)
2016 PERIOD 1 G '10Jan2016'd '28Jan2016'd
2016 PERIOD 1 H '10Jan2016'd '28Jan2016'd
2016 PERIOD 2 G '04Jul2016'd '19Jul2016'd
2016 PERIOD 2 H '04Jul2016'd '19Jul2016'd
2016 PERIOD 3 G '26Sep2016'd '11Oct2016'd
2016 PERIOD 3 H '26Sep2016'd '11Oct2016'd
2016 PERIOD 4 G '18Dec2016'd '31Jan2017'd
2016 PERIOD 4 H '19Dec2016'd '28Jan2017'd
My final table would contain the 2 columns (id & transaction date) plus the indicator variable 1/0 if the transaction date is indeed between those 2 dates.
TIA
Peter
Just do a left join using a between - and condition. Set the flag if you got a match (or not).
Is there an ID in the second table? What are you joining the tables on? After you join the table you can do:
Data want;
set have;
if begin_date < transaction_date < end_date then flag = 1;
else flag = 0;
run;
No need for a second step.
Just test on the occurrence of the id variable in the right table.
Linus,
Without an ID in the second table how would you be joining?
Hi Mark and Linus the problem is that the id in the second table.is different to that of the first i.e.
PERIOD_IDENTIFIER YEAR PERIOD GROUP
1111 2016 PERIOD 1 G '10Jan2016'd '28Jan2016'd
1112 2016 PERIOD 1 H '10Jan2016'd '28Jan2016'd
1113 2016 PERIOD 2 G '04Jul2016'd '19Jul2016'd
1114 2016 PERIOD 2 H '04Jul2016'd '19Jul2016'd
1115 2016 PERIOD 3 G '26Sep2016'd '11Oct2016'd
1116 2016 PERIOD 3 H '26Sep2016'd '11Oct2016'd
1117 2016 PERIOD 4 G '18Dec2016'd '31Jan2017'd
1118 2016 PERIOD 4 H '19Dec2016'd '28Jan2017'd
I mean we can try to append all the date that are between the date range with the PERIOD_IDENTIFIER appended to the first (transaction) table but we first need to join these 2 tables somehow.
The first tables will have duplicates but the second table the identifier is unique.
The WANT data set look something like this
id transaction_date group PERIOD_START PERIOD_END FLAG
1 19/03/2016 G . . 0
2 21/01/2016 H '10Jan2016'd '28Jan2016'd 1
3 05/07/2016 H '04Jul2016'd '19Jul2016'd 1
4 27/09/2016 G '26Sep2016'd '11Oct2016'd 1
5 25/03/2016 G . . 0
6 02/01/2017 H '19Dec2016'd '28Jan2017'd 1
So far I have attempted the following codes which returns duplicates
proc sql;
create table test as
select
a.id
, a.transaction_date
, a.group
, b.period_start
, b.period_end
, flag
from transaction_table a
left outer join
period_table b
on a.transaction_date between b.period_start and b.period_end
and a.group = b.group
quit;
Unless I'm missing out something or proc sql might not be good for this task.
The join will produce duplicates, if date intervals in the second dataset overlap.
On the other hand, if you decide not to include detailed information from this second table, you can aggregate the result of join:
proc sql;
create table test as
select
a.id
, a.transaction_date
, a.group
, max(b.group is not missing) as flag
from transaction_table a
left outer join
period_table b
on a.transaction_date between b.period_start and b.period_end
and a.group = b.group
group by 1,2,3
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.