I misspoke on my previous ask so I created a new posting to better characterize my question.
I have a dataset below as follows:
data have1; infile cards truncover expandtabs; input STORE_ID $ customer_id $ transaction_id $ transaction_date :date9.; format transaction_date date8.; cards; A0001 aa 1 22jan2010 A0001 aa 2 23jan2010 A0001 ay 3 24jan2010 A0001 ab 4 11mar2010 A0001 cc 5 11jun2011 A0001 dd 6 12jun2012 A0001 ee 7 13jul2012 A0001 ee 8 10aug2012 A0001 ef 9 10aug2012 A0001 ff 10 10aug2012 A0001 ff 11 07dec2012 A0001 gg 12 22dec2012 A0001 ds 13 01jan2013 A0002 fz 1 07dec2012 A0002 gb 2 22dec2012 A0002 dw 3 01jan2013 A0003 mg 1 22dec2012 A0003 sf 2 01jan2013 ;
I want to accomplish two tasks:
The code below I have generated gives me relevant information but I have trouble with the first and last rows so perhaps I've gone about this incorrectly. Any suggestions would be helpful. Thank you.
data want;
set have1;
by store_id;
difdate=ifn(first.store_id,.,dif(transaction_date));
if first.store_id or transaction_date>cutoff then do;
cutoff=transaction_date+30;
end;
last_transaction=lag(transaction_date);
if first.store_id then last_transaction=.;
if last.store_id then do;
if transaction_date le '31-DEC-2013'd then closure=1;
end;
if difdate gt 30 then closure=1;
retain cutoff;
format cutoff last_transaction date9.;
run;
I think you are quite close. Try this:
data groups;
set have1;
by store_id;
if not first.store_id and dif(transaction_date)>30 then
group+1;
run;
data want;
do until(last.group);
set groups;
by store_id group;
if first.group then
open=transaction_date;
end;
close=transaction_date;
closed=close<'31DEC2012'd;
format open close date8.;
keep store_id open close closed;
run;
Hello,
I think you can adjust below code to get exactly what you want.
It uses the "look ahead one observation"-trick with firstobs=2.
proc sort data=have1;
by STORE_ID transaction_date ;
run;
data want(drop = STORE_ID2 transaction_date2);
set have1;
set have1(firstobs=2
drop = customer_id transaction_id
rename = (STORE_ID = STORE_ID2 transaction_date = transaction_date2) );
if STORE_ID ^= STORE_ID2 then do; STORE_ID2=''; transaction_date2=.; end;
gap = INTCK('DAY',transaction_date,transaction_date2,'CONTINUOUS');
closure = 0;
if gap > 30 then closure=1;
run;
/* end of program */
Thanks,
Koen
I still don't know how to change this information to construct open/close dates using the gaps. This information provides the gap periods > 30 days; however, my main question still persists--how to construct open and close dates with the closure=1 periods too.
I want to essentially create for each store_id, open and close dates.
So, for example, store1 has 6 rows; 4 for the gaps but also the from Jan 22-Jan 24; and again from Dec 7-Jan1 2013
Hello @A_Swoosh ,
I was just about to write the exact same thing as @maguiremq .
Create a data step with data want; and cards (datalines); statements to show us what your want dataset looks like which should be the result of processing your have dataset.
Thanks,
Koen
data want;
infile cards truncover expandtabs;
input STORE_ID $ open :date9. close :date9. closed $;
format open close date8.;
cards;
A0001 22jan2010 24jan2010 1
A0001 11mar2010 11mar2010 1
A0001 11jun2011 11jun2011 1
A0001 12jun2012 13jul2012 1
A0001 13jul2012 10aug2012 1
A0001 07dec2012 01jan2013 0
A0002 07dec2012 01jan2013 0
A0003 22dec2012 01jan2013 0
;
The last two and the final one for the first store are not defined as closed because the data goes all the way to the end of the study period. So, anything before '31-DEC-2012'd would be defined as a closure, and anything with a gap >30 days would be defined as a closure.
Maybe I'm just getting lost at this point, but why are transaction number 6 and 7 for store_id A0001 not in your `want` data set?
Thanks for posting your `want` data, by the way.
STORE_ID transaction_id transaction_date lag_tx_date diff A0001 1 22JAN2010 . . A0001 2 23JAN2010 22JAN2010 1 A0001 3 24JAN2010 23JAN2010 1 A0001 4 11MAR2010 24JAN2010 46 A0001 5 11JUN2011 11MAR2010 457 A0001 6 12JUN2012 11JUN2011 367 A0001 7 13JUL2012 12JUN2012 31 A0001 8 10AUG2012 13JUL2012 28 A0001 9 10AUG2012 10AUG2012 0 A0001 10 10AUG2012 10AUG2012 0 A0001 11 07DEC2012 10AUG2012 119 A0001 12 22DEC2012 07DEC2012 15 A0001 13 01JAN2013 22DEC2012 10
Okay, so why is 6 not it's own record? And 7, too? Am I misunderstanding something? I'm using the INTCK function like @sbxkoenk.
Gaps of >30 days or more indicates a closure.
So for the first store_id:
Open: Jan 22-2010; Close: Jan 24-2010
Open: Mar-11-2010; Close: Mar-11-2010
Open: Jun-11-2011; Close: Jun-11-2010
Open: Jun 12-2012; Close: Jul-13-2012
Open: Jul-13-2012; Close: Aug-10-2012
Open: Dec-7-2012; Close Jan 1-2013
I stand corrected--I missed a row.
@maguiremq --I saw your website and know you work with claims data and this is an example test case for it.
This is what I've come up with so far:
data test;
set have1;
by store_id transaction_date;
if first.store_id or dif(transaction_date) gt 30 then n+1;
run;
I now want to generate open date using the first of the group and last as the close date, but if only one in the group then it's both open and close.
I think you are quite close. Try this:
data groups;
set have1;
by store_id;
if not first.store_id and dif(transaction_date)>30 then
group+1;
run;
data want;
do until(last.group);
set groups;
by store_id group;
if first.group then
open=transaction_date;
end;
close=transaction_date;
closed=close<'31DEC2012'd;
format open close date8.;
keep store_id open close closed;
run;
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.