BookmarkSubscribeRSS Feed
Pooja2
Fluorite | Level 6

I have this data and i have to find out which customer purchased more than 40 items during any 60 day. I am unable to find the logic.. please help.

 

Thanks in advance!

 

 

customer IDdateno of items purchased
2333A24-May-073
2333A27-Jun-075
2333A8-Jul-073
2333A9-Jul-078
2333A20-Jul-079
2333A25-Jun-079
2333A30-Jun-077
2333A5-Jul-076
2333A7-Jul-071
5643D24-May-075
5643D27-Jun-075
5643D8-Jul-075
5643D9-Jul-075
5643D20-Jul-075
5643D25-Jun-076
5643D30-Jun-077
5643D5-Jul-075
5643D7-Jul-071
5643D18-May-072
5643D20-Jun-072
2287C25-Jun-074
2287C30-Jun-072
2287C5-Jul-073
12 REPLIES 12
Reeza
Super User

How big is your data?

Is the 60 day interval any 60 days? That does make the problem a lot harder...

 

Essentially you do a SELF JOIN using the date interval and checking 60 day interval in either direction each time. 

novinosrin
Tourmaline | Level 20

Is the 60 day interval any 60 days? That does make the problem a lot harder...

 

Impeccable and can't agree more. 

 

@Pooja2 does it have to be in proc sql? Coz in datastep it's far easier to take each row date as begin date for a 60 day period

 

Pooja2
Fluorite | Level 6

I m gonna try out the solutions mentioned here and would post whether i got the results or not. I do have 36000 obs.

 

thanks!

Pooja2
Fluorite | Level 6

it doesn't have to be sql. it could be datastep. So far none of the solutions suggested worked.

Reeza
Super User

@Pooja2 wrote:

it doesn't have to be sql. it could be datastep. So far none of the solutions suggested worked.


How do they not work?

Pooja2
Fluorite | Level 6

my dataset has almost 36k obs and it would be helpful if you could give me the logic in he form of code so that I can test first before running my original data.

 

thanks.

Pooja2
Fluorite | Level 6

also it is any 60 day period.

LinusH
Tourmaline | Level 20
Here's an idea (that might kill your machine if your data is too big):
Create a multilabel format with all possible 60 day intervals that can reside in your data set.
Run PROC SUMMARY using that format
Filter out your top consumers.
Data never sleeps
Patrick
Opal | Level 21

@Pooja2

Below code should return what you're asking for. If it will perform sufficiently depends very much on the data volumes you're dealing with as well as the place where the data resides (SAS tables or data base).


proc sql;
  create table want as
  select
    customer_id,
    max(n_items_sum60) as max_n_items_sum60
    from
      (
        select
          o.customer_id,
          (
            select sum(i.n_items_purchased)
            from have i
            where i.customer_id=o.customer_id and o.date between i.date-60 and i.date+60
          )
            as n_items_sum60
        from have as o
      )
    group by customer_id
    having max_n_items_sum60>40
  ;
quit;
Pooja2
Fluorite | Level 6

This should be working. I will update in a day or two.

 

thanks!

s_lassen
Meteorite | Level 14

Here is a way to do it. Not sure how well it scales. If you have large amounts of data, a data step solution may be faster:

data have;
input customer_ID $/date date9./NoOfItems;
format date date9.;
cards;
2333A
24-May-07
3
2333A
27-Jun-07
5
2333A
8-Jul-07
3
2333A
9-Jul-07
8
2333A
20-Jul-07
9
2333A
25-Jun-07
9
2333A
30-Jun-07
7
2333A
5-Jul-07
6
2333A
7-Jul-07
1
5643D
24-May-07
5
5643D
27-Jun-07
5
5643D
8-Jul-07
5
5643D
9-Jul-07
5
5643D
20-Jul-07
5
5643D
25-Jun-07
6
5643D
30-Jun-07
7
5643D
5-Jul-07
5
5643D
7-Jul-07
1
5643D
18-May-07
2
5643D
20-Jun-07
2
2287C
25-Jun-07
4
2287C
30-Jun-07
2
2287C
5-Jul-07
3
;run;
proc sql;
  create table Sixty_sum
    as select Customer_id,date,(select sum(NoOfItems) from have
                                where Customer_id=outer.Customer_ID
                                  and date between outer.Date and outer.Date+59) as NoOfItems60
    from have as outer
    having calculated NoOfItems60>40
     ;

Astounding
PROC Star

This solution represents an important recognition of the requirements.  You don't need to look backwards.  You only need to look forwards (date + 59).  The reason:  if looking backwards would find a set of days that totaled 40+ items, you would find that same set of days by looking forward only.  You would just find it based on an earlier observation for the same customer.

 

In a DATA step, you wouldn't need the full Cartesian product, although the programming becomes a bit messy.  If needed:

 

proc sort data=have;

by custid date;

run;

 

Then take advantage of the order to the observations.  (This does not necessarily mean that it runs faster than SQL ... only testing would reveal that):

 

data want (where=(bought_40_flag = 1));

set have nobs=n_obs;

by custid date;

if first.custid then do;

   total_items = n_items;

   if n_items < 40 then bought_40_flag = 0;

  else do;

      bought_40_flag=1;

      output;

   end;

end;

retain bought_40_flag;

if bought_40_flag=0 and _n_ < n_obs then do next_obs = _n_+1 to n_obs;

   set have (rename=(custid = next_custid date = next_date)) point=next_obs;

   if custid = next_custid and (next_date - date) < 60 then do;

      total_items + n_items;

      if total_items >= 40 then do;

         bought_40_flag=1;

         output;

         return;

      end;

   end;

   else return;  /* now at next customer or beyond 60 day limit */

end;

run;

 

Like I said, it's messy.  It's also untested, so I hope I got it right but it might need a little tweaking.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 1291 views
  • 4 likes
  • 7 in conversation