I have this data and i have to find out which customer purchased more than 40 items during any 60 day. I am unable to find the logic.. please help.
Thanks in advance!
customer ID | date | no of items purchased |
2333A | 24-May-07 | 3 |
2333A | 27-Jun-07 | 5 |
2333A | 8-Jul-07 | 3 |
2333A | 9-Jul-07 | 8 |
2333A | 20-Jul-07 | 9 |
2333A | 25-Jun-07 | 9 |
2333A | 30-Jun-07 | 7 |
2333A | 5-Jul-07 | 6 |
2333A | 7-Jul-07 | 1 |
5643D | 24-May-07 | 5 |
5643D | 27-Jun-07 | 5 |
5643D | 8-Jul-07 | 5 |
5643D | 9-Jul-07 | 5 |
5643D | 20-Jul-07 | 5 |
5643D | 25-Jun-07 | 6 |
5643D | 30-Jun-07 | 7 |
5643D | 5-Jul-07 | 5 |
5643D | 7-Jul-07 | 1 |
5643D | 18-May-07 | 2 |
5643D | 20-Jun-07 | 2 |
2287C | 25-Jun-07 | 4 |
2287C | 30-Jun-07 | 2 |
2287C | 5-Jul-07 | 3 |
How big is your data?
Is the 60 day interval any 60 days? That does make the problem a lot harder...
Essentially you do a SELF JOIN using the date interval and checking 60 day interval in either direction each time.
Is the 60 day interval any 60 days? That does make the problem a lot harder...
Impeccable and can't agree more.
@Pooja2 does it have to be in proc sql? Coz in datastep it's far easier to take each row date as begin date for a 60 day period
I m gonna try out the solutions mentioned here and would post whether i got the results or not. I do have 36000 obs.
thanks!
it doesn't have to be sql. it could be datastep. So far none of the solutions suggested worked.
@Pooja2 wrote:
it doesn't have to be sql. it could be datastep. So far none of the solutions suggested worked.
How do they not work?
my dataset has almost 36k obs and it would be helpful if you could give me the logic in he form of code so that I can test first before running my original data.
thanks.
also it is any 60 day period.
Below code should return what you're asking for. If it will perform sufficiently depends very much on the data volumes you're dealing with as well as the place where the data resides (SAS tables or data base).
proc sql;
create table want as
select
customer_id,
max(n_items_sum60) as max_n_items_sum60
from
(
select
o.customer_id,
(
select sum(i.n_items_purchased)
from have i
where i.customer_id=o.customer_id and o.date between i.date-60 and i.date+60
)
as n_items_sum60
from have as o
)
group by customer_id
having max_n_items_sum60>40
;
quit;
This should be working. I will update in a day or two.
thanks!
Here is a way to do it. Not sure how well it scales. If you have large amounts of data, a data step solution may be faster:
data have;
input customer_ID $/date date9./NoOfItems;
format date date9.;
cards;
2333A
24-May-07
3
2333A
27-Jun-07
5
2333A
8-Jul-07
3
2333A
9-Jul-07
8
2333A
20-Jul-07
9
2333A
25-Jun-07
9
2333A
30-Jun-07
7
2333A
5-Jul-07
6
2333A
7-Jul-07
1
5643D
24-May-07
5
5643D
27-Jun-07
5
5643D
8-Jul-07
5
5643D
9-Jul-07
5
5643D
20-Jul-07
5
5643D
25-Jun-07
6
5643D
30-Jun-07
7
5643D
5-Jul-07
5
5643D
7-Jul-07
1
5643D
18-May-07
2
5643D
20-Jun-07
2
2287C
25-Jun-07
4
2287C
30-Jun-07
2
2287C
5-Jul-07
3
;run;
proc sql;
create table Sixty_sum
as select Customer_id,date,(select sum(NoOfItems) from have
where Customer_id=outer.Customer_ID
and date between outer.Date and outer.Date+59) as NoOfItems60
from have as outer
having calculated NoOfItems60>40
;
This solution represents an important recognition of the requirements. You don't need to look backwards. You only need to look forwards (date + 59). The reason: if looking backwards would find a set of days that totaled 40+ items, you would find that same set of days by looking forward only. You would just find it based on an earlier observation for the same customer.
In a DATA step, you wouldn't need the full Cartesian product, although the programming becomes a bit messy. If needed:
proc sort data=have;
by custid date;
run;
Then take advantage of the order to the observations. (This does not necessarily mean that it runs faster than SQL ... only testing would reveal that):
data want (where=(bought_40_flag = 1));
set have nobs=n_obs;
by custid date;
if first.custid then do;
total_items = n_items;
if n_items < 40 then bought_40_flag = 0;
else do;
bought_40_flag=1;
output;
end;
end;
retain bought_40_flag;
if bought_40_flag=0 and _n_ < n_obs then do next_obs = _n_+1 to n_obs;
set have (rename=(custid = next_custid date = next_date)) point=next_obs;
if custid = next_custid and (next_date - date) < 60 then do;
total_items + n_items;
if total_items >= 40 then do;
bought_40_flag=1;
output;
return;
end;
end;
else return; /* now at next customer or beyond 60 day limit */
end;
run;
Like I said, it's messy. It's also untested, so I hope I got it right but it might need a little tweaking.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.