BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
A_Swoosh
Quartz | Level 8

I misspoke on my previous ask so I created a new posting to better characterize my question.

 

I have a dataset below as follows:

data have1;
  infile cards truncover expandtabs;
  input STORE_ID $ customer_id $ transaction_id $ transaction_date :date9.;
  format transaction_date date8.;
  cards;
A0001 aa 1 22jan2010  
A0001 aa 2 23jan2010  
A0001 ay 3 24jan2010  
A0001 ab 4 11mar2010  
A0001 cc 5 11jun2011  
A0001 dd 6 12jun2012  
A0001 ee 7 13jul2012  
A0001 ee 8 10aug2012  
A0001 ef 9 10aug2012  
A0001 ff 10 10aug2012  
A0001 ff 11 07dec2012  
A0001 gg 12 22dec2012  
A0001 ds 13 01jan2013  
A0002 fz 1 07dec2012  
A0002 gb 2 22dec2012  
A0002 dw 3 01jan2013  
A0003 mg 1 22dec2012  
A0003 sf 2 01jan2013  
;

I want to accomplish two tasks:

  1. I want to essentially create an open and close period for each store and treat gaps of 30 days in the transaction date as a closure flagged by an indicator (closure=1 for example).
    • Basically, I want to create a dataset that has opened and closed based on transaction_date and gaps of >30 days are treated as closures.
  2. I want to also generate a closure=1 if the date falls before a date (e.g., 31-DEC-2013)

The code below I have generated gives me relevant information but I have trouble with the first and last rows so perhaps I've gone about this incorrectly. Any suggestions would be helpful. Thank you.

 

data want;
  set have1;
  by store_id;
  difdate=ifn(first.store_id,.,dif(transaction_date));
  if first.store_id or transaction_date>cutoff then do;
    cutoff=transaction_date+30;
  end;
 last_transaction=lag(transaction_date);
if first.store_id then last_transaction=.;
if last.store_id then do;
if transaction_date le '31-DEC-2013'd then closure=1; 
end;
if difdate gt 30 then closure=1;
  retain cutoff;
  format cutoff last_transaction date9.;
 run;
1 ACCEPTED SOLUTION

Accepted Solutions
s_lassen
Meteorite | Level 14

I think you are quite close. Try this:

data groups;                                                
  set have1;                                                
  by store_id;                                              
  if not first.store_id and dif(transaction_date)>30 then   
    group+1;                                                
run;                                                        
data want;                                                  
  do until(last.group);                                     
    set groups;                                             
    by store_id group;                                      
    if first.group then                                     
      open=transaction_date;                                
    end;                                                    
  close=transaction_date;                                   
  closed=close<'31DEC2012'd;                                
  format open close date8.;                                 
  keep store_id open close closed;                          
run;

View solution in original post

11 REPLIES 11
sbxkoenk
SAS Super FREQ

Hello,

 

I think you can adjust below code to get exactly what you want.

It uses the "look ahead one observation"-trick with firstobs=2.

 

proc sort data=have1;
 by STORE_ID transaction_date ;
run;

data want(drop = STORE_ID2 transaction_date2);
 set have1;
 set have1(firstobs=2 
           drop = customer_id transaction_id
           rename = (STORE_ID = STORE_ID2 transaction_date = transaction_date2) );
 if STORE_ID ^= STORE_ID2 then do; STORE_ID2=''; transaction_date2=.; end; 
 gap = INTCK('DAY',transaction_date,transaction_date2,'CONTINUOUS');
 closure = 0;
 if gap > 30 then closure=1;
run;
/* end of program */

Thanks,

Koen

A_Swoosh
Quartz | Level 8

I still don't know how to change this information to construct open/close dates using the gaps. This information provides the gap periods > 30 days; however, my main question still persists--how to construct open and close dates with the closure=1 periods too.

I want to essentially create for each store_id, open and close dates.

So, for example,  store1 has 6 rows; 4 for the gaps but also the from Jan 22-Jan 24; and again from Dec 7-Jan1 2013

maguiremq
SAS Super FREQ
Can you post what you want your data to look like? Basically edit your `have1` data set with the CARDS statement to show us exactly what you want?
sbxkoenk
SAS Super FREQ

Hello @A_Swoosh ,

 

I was just about to write the exact same thing as @maguiremq .


Create a data step with data want; and cards (datalines); statements to show us what your want dataset looks like which should be the result of processing your have dataset.

 

Thanks,

Koen

A_Swoosh
Quartz | Level 8

@maguiremq @sbxkoenk 

data want;
  infile cards truncover expandtabs;
  input STORE_ID $ open :date9. close :date9. closed $;
  format open close date8.;
  cards;
A0001 22jan2010 24jan2010 1
A0001 11mar2010 11mar2010 1
A0001 11jun2011 11jun2011 1
A0001 12jun2012 13jul2012 1
A0001 13jul2012 10aug2012 1
A0001 07dec2012 01jan2013 0
A0002 07dec2012 01jan2013 0
A0003 22dec2012 01jan2013 0
;

The last two and the final one for the first store are not defined as closed because the data goes all the way to the end of the study period. So, anything before '31-DEC-2012'd would be defined as a closure, and anything with a gap >30 days would be defined as a closure.

maguiremq
SAS Super FREQ

Maybe I'm just getting lost at this point, but why are transaction number 6 and 7 for store_id A0001 not in your `want` data set?

 

Thanks for posting your `want` data, by the way.

A_Swoosh
Quartz | Level 8
Transaction 6 is 4th row in the 'want' dataset. Transaction 7 is rolled up into the 4th row because there wasn't a gap of >30 days.
maguiremq
SAS Super FREQ
STORE_ID 	transaction_id 	transaction_date 	lag_tx_date 	diff
A0001 	1 	22JAN2010 	. 	.
A0001 	2 	23JAN2010 	22JAN2010 	1
A0001 	3 	24JAN2010 	23JAN2010 	1
A0001 	4 	11MAR2010 	24JAN2010 	46
A0001 	5 	11JUN2011 	11MAR2010 	457
A0001 	6 	12JUN2012 	11JUN2011 	367
A0001 	7 	13JUL2012 	12JUN2012 	31
A0001 	8 	10AUG2012 	13JUL2012 	28
A0001 	9 	10AUG2012 	10AUG2012 	0
A0001 	10 	10AUG2012 	10AUG2012 	0
A0001 	11 	07DEC2012 	10AUG2012 	119
A0001 	12 	22DEC2012 	07DEC2012 	15
A0001 	13 	01JAN2013 	22DEC2012 	10

Okay, so why is 6 not it's own record? And 7, too? Am I misunderstanding something? I'm using the INTCK function like @sbxkoenk.

 

A_Swoosh
Quartz | Level 8

Gaps of >30 days or more indicates a closure.
So for the first store_id:
Open: Jan 22-2010; Close: Jan 24-2010
Open: Mar-11-2010; Close: Mar-11-2010
Open: Jun-11-2011; Close: Jun-11-2010
Open: Jun 12-2012; Close: Jul-13-2012
Open: Jul-13-2012; Close: Aug-10-2012
Open: Dec-7-2012; Close Jan 1-2013

I stand corrected--I missed a row.

A_Swoosh
Quartz | Level 8

@maguiremq --I saw your website and know you work with claims data and this is an example test case for it.

@sbxkoenk 

 

This is what I've come up with so far:

data test;
 set have1;
 by store_id transaction_date;
 if first.store_id or dif(transaction_date) gt 30 then n+1;
run;

I now want to generate open date using the first of the group and last as the close date, but if only one in the group then it's both open and close.

s_lassen
Meteorite | Level 14

I think you are quite close. Try this:

data groups;                                                
  set have1;                                                
  by store_id;                                              
  if not first.store_id and dif(transaction_date)>30 then   
    group+1;                                                
run;                                                        
data want;                                                  
  do until(last.group);                                     
    set groups;                                             
    by store_id group;                                      
    if first.group then                                     
      open=transaction_date;                                
    end;                                                    
  close=transaction_date;                                   
  closed=close<'31DEC2012'd;                                
  format open close date8.;                                 
  keep store_id open close closed;                          
run;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 2433 views
  • 4 likes
  • 4 in conversation