Hi Everyone,
I have a dataset with transaction dates; I need to find ANY date within the dataset where 7 transactions were made in the (immediate) prior 6 months, and output that date, with how many transactions made in the prior 6 months.
Is there an efficient way to do that in SQL or datastep?
thanks
data have;
input trans mmddyy10.;
format trans mmddyy10.;
cards;
03/25/2007
04/15/2007
05/13/2007
06/10/2007
07/08/2007
08/12/2007
09/16/2007
10/14/2007
11/11/2007
12/08/2007
01/01/2008
03/02/2008
03/30/2008
06/02/2008
07/21/2008
08/20/2008
10/15/2008
12/10/2008
01/11/2009
03/15/2009
05/17/2009
07/05/2009
08/23/2009
10/18/2009
11/15/2009
12/13/2009
01/10/2010
02/28/2010
05/16/2010
06/13/2010
07/04/2010
08/01/2010
09/27/2010
10/25/2010
08/29/2010
11/15/2010
12/18/2010
02/12/2011
03/23/2011
04/25/2011
05/21/2011
06/18/2011
08/22/2011
09/17/2011
10/22/2011
11/26/2011
;
run;
OK. Here is :
proc sql; create table want as select *,(select count(*) from have where id=a.id and trans between intnx('month',a.trans,-6,'b') and intnx('month',a.trans,-1,'e') ) as how_many, (select min(trans) from have where id=a.id and trans between intnx('month',a.trans,-6,'b') and intnx('month',a.trans,-1,'e') ) as earliest_day format=date9. from have as a where calculated how_many gt 6; quit;
Xia Keshan
proc sql ;
Create table wanted as
select c.trans, count( d.trans) as tnum
From have c
join have d
On d.trans between
intnx('month', c.trans,-1,'ending')
and (intnx('month',c.trans,-6,'beginning'))
Group by c.trans
having tnum >= 7
;
Quit;
Should work but there might be more efficient ways
Thank you very much Peter!
Would you be able to modify to include by client ID and add the (prior) date when the 7 transactions was met?
data have;
input id trans mmddyy10.;
format trans mmddyy10.;
cards;
1 03/25/2007
1 04/15/2007
1 05/13/2007
1 06/10/2007
1 07/08/2007
1 08/12/2007
1 09/16/2007
1 10/14/2007
1 11/11/2007
1 12/08/2007
1 01/01/2008
1 03/02/2008
1 03/30/2008
1 06/02/2008
1 07/21/2008
1 08/20/2008
1 10/15/2008
1 12/10/2008
1 01/11/2009
1 03/15/2009
1 05/17/2009
1 07/05/2009
1 08/23/2009
1 10/18/2009
1 11/15/2009
1 12/13/2009
1 01/10/2010
1 02/28/2010
1 05/16/2010
1 06/13/2010
1 07/04/2010
1 08/01/2010
1 09/27/2010
1 10/25/2010
1 08/29/2010
1 11/15/2010
1 12/18/2010
1 02/12/2011
1 03/23/2011
1 04/25/2011
1 05/21/2011
1 06/18/2011
1 08/22/2011
1 09/17/2011
1 10/22/2011
1 11/26/2011
2 11/09/2007
2 12/13/2007
2 01/05/2008
2 02/08/2008
2 03/07/2008
2 04/05/2008
2 05/30/2008
2 07/20/2008
2 08/15/2008
2 09/19/2008
2 11/10/2008
2 12/12/2008
2 02/13/2009
2 03/30/2009
2 06/01/2009
2 07/20/2009
2 09/21/2009
2 11/13/2009
2 01/15/2010
2 03/29/2010
2 06/18/2010
2 07/16/2010
2 09/03/2010
2 09/25/2010
2 10/29/2010
2 11/20/2010
2 12/11/2010
2 01/14/2011
2 03/18/2011
2 04/22/2011
2 06/03/2011
2 06/30/2011
2 07/28/2011
2 08/25/2011
2 09/29/2011
2 11/19/2011
2 12/16/2011
2 01/13/2012
2 02/09/2012
2 03/02/2012
2 04/05/2012
2 05/03/2012
2 06/28/2012
2 08/10/2012
2 10/19/2012
2 12/29/2012
;
run;
Adding ID below.
however i'm not sure about generating the date at which the count goes to 7.
I don't think the rank() function would work in that context
proc sql ;
Create table wanted as
select c.id, c.trans, count( d.trans) as tnum
From have c
join have d
On d.trans between
intnx('month', c.trans,-1,'ending')
and (intnx('month',c.trans,-6,'beginning'))
and c.id=d.id
Group by c.id, c.trans
having tnum >= 7
;
Quit;
Thanks very much, maybe I'll try to get the prior date in a separate data step.
At least this preliminary step provides the sub-setting list to reduce the data that needs to be investigated.
You mean greater than and equal 7 transactions were made ?
data have; input id trans mmddyy10.; format trans mmddyy10.; cards; 1 03/25/2007 1 04/15/2007 1 05/13/2007 1 06/10/2007 1 07/08/2007 1 08/12/2007 1 09/16/2007 1 10/14/2007 1 11/11/2007 1 12/08/2007 1 01/01/2008 1 03/02/2008 1 03/30/2008 1 06/02/2008 1 07/21/2008 1 08/20/2008 1 10/15/2008 1 12/10/2008 1 01/11/2009 1 03/15/2009 1 05/17/2009 1 07/05/2009 1 08/23/2009 1 10/18/2009 1 11/15/2009 1 12/13/2009 1 01/10/2010 1 02/28/2010 1 05/16/2010 1 06/13/2010 1 07/04/2010 1 08/01/2010 1 09/27/2010 1 10/25/2010 1 08/29/2010 1 11/15/2010 1 12/18/2010 1 02/12/2011 1 03/23/2011 1 04/25/2011 1 05/21/2011 1 06/18/2011 1 08/22/2011 1 09/17/2011 1 10/22/2011 1 11/26/2011 2 11/09/2007 2 12/13/2007 2 01/05/2008 2 02/08/2008 2 03/07/2008 2 04/05/2008 2 05/30/2008 2 07/20/2008 2 08/15/2008 2 09/19/2008 2 11/10/2008 2 12/12/2008 2 02/13/2009 2 03/30/2009 2 06/01/2009 2 07/20/2009 2 09/21/2009 2 11/13/2009 2 01/15/2010 2 03/29/2010 2 06/18/2010 2 07/16/2010 2 09/03/2010 2 09/25/2010 2 10/29/2010 2 11/20/2010 2 12/11/2010 2 01/14/2011 2 03/18/2011 2 04/22/2011 2 06/03/2011 2 06/30/2011 2 07/28/2011 2 08/25/2011 2 09/29/2011 2 11/19/2011 2 12/16/2011 2 01/13/2012 2 02/09/2012 2 03/02/2012 2 04/05/2012 2 05/03/2012 2 06/28/2012 2 08/10/2012 2 10/19/2012 2 12/29/2012 ; run; proc sql; create table want as select *,(select count(*) from have where id=a.id and trans between intnx('month',a.trans,-6,'b') and intnx('month',a.trans,-1,'e') ) as how_many from have as a where calculated how_many gt 6; quit;
Xia Keshan
Thanks Xia; any idea how to identify and output the (prior) date (trans) when the wanted criterion was met (for example, for the 11/15/2017, also output the prior earliest date when 7 transactions occurred within 6 months)?
OK. Here is :
proc sql; create table want as select *,(select count(*) from have where id=a.id and trans between intnx('month',a.trans,-6,'b') and intnx('month',a.trans,-1,'e') ) as how_many, (select min(trans) from have where id=a.id and trans between intnx('month',a.trans,-6,'b') and intnx('month',a.trans,-1,'e') ) as earliest_day format=date9. from have as a where calculated how_many gt 6; quit;
Xia Keshan
Xua
Would the earliest of 8 prior trans within 6 months be the wrong date?
I'm trying to imagine a constraint for the query that returns the 7th most recent transaction before each trans date:
Sounds like a HAVING
where the restriction is the number of rows that satisfy the join
what follows is not the whole query but just thoughts on obtaining the 7th earlier trans
(ubtedted thoughts)
Select min(b.trans), a.id
from have a join have b on a.id=b.id and
b.trans <a.trans and
b.trans >= intnx( 'month', a.trans, -6, 'sameday')
group by a.id, a.trans
having count(*) = 7
Hi Peter.C,
I don't understand what you mean totally.
Why would you think that would be wrong date ?
I think you shouldn't use ON in your sql statement, that will not eliminate obs when you are making Cartesian Product , but WHERE will, therefore I suggest you use WHERE instead of ON.
And Yes, The Cartesian Product you are using is more efficient than mine.
Best.
Xia Keshan
Thank you Peter and Xia very much for taking the time to help with this question!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.