DATA Step, Macro, Functions and more

Proc sql: conditional programming

Reply
Occasional Contributor
Posts: 15

Proc sql: conditional programming

I have this data and i have to find out which customer purchased more than 40 items during any 60 day. I am unable to find the logic.. please help.

 

Thanks in advance!

 

 

customer IDdateno of items purchased
2333A24-May-073
2333A27-Jun-075
2333A8-Jul-073
2333A9-Jul-078
2333A20-Jul-079
2333A25-Jun-079
2333A30-Jun-077
2333A5-Jul-076
2333A7-Jul-071
5643D24-May-075
5643D27-Jun-075
5643D8-Jul-075
5643D9-Jul-075
5643D20-Jul-075
5643D25-Jun-076
5643D30-Jun-077
5643D5-Jul-075
5643D7-Jul-071
5643D18-May-072
5643D20-Jun-072
2287C25-Jun-074
2287C30-Jun-072
2287C5-Jul-073
Super User
Posts: 23,958

Re: Proc sql: conditional programming

How big is your data?

Is the 60 day interval any 60 days? That does make the problem a lot harder...

 

Essentially you do a SELF JOIN using the date interval and checking 60 day interval in either direction each time. 

Super User
Posts: 2,046

Re: Proc sql: conditional programming

[ Edited ]

Is the 60 day interval any 60 days? That does make the problem a lot harder...

 

Impeccable and can't agree more. 

 

@Pooja2 does it have to be in proc sql? Coz in datastep it's far easier to take each row date as begin date for a 60 day period

 

Occasional Contributor
Posts: 15

Re: Proc sql: conditional programming

Posted in reply to novinosrin

I m gonna try out the solutions mentioned here and would post whether i got the results or not. I do have 36000 obs.

 

thanks!

Occasional Contributor
Posts: 15

Re: Proc sql: conditional programming

Posted in reply to novinosrin

it doesn't have to be sql. it could be datastep. So far none of the solutions suggested worked.

Super User
Posts: 23,958

Re: Proc sql: conditional programming


@Pooja2 wrote:

it doesn't have to be sql. it could be datastep. So far none of the solutions suggested worked.


How do they not work?

Occasional Contributor
Posts: 15

Re: Proc sql: conditional programming

my dataset has almost 36k obs and it would be helpful if you could give me the logic in he form of code so that I can test first before running my original data.

 

thanks.

Occasional Contributor
Posts: 15

Re: Proc sql: conditional programming

also it is any 60 day period.

Super User
Posts: 5,914

Re: Proc sql: conditional programming

Here's an idea (that might kill your machine if your data is too big):
Create a multilabel format with all possible 60 day intervals that can reside in your data set.
Run PROC SUMMARY using that format
Filter out your top consumers.
Data never sleeps
Respected Advisor
Posts: 4,779

Re: Proc sql: conditional programming

@Pooja2

Below code should return what you're asking for. If it will perform sufficiently depends very much on the data volumes you're dealing with as well as the place where the data resides (SAS tables or data base).


proc sql;
  create table want as
  select
    customer_id,
    max(n_items_sum60) as max_n_items_sum60
    from
      (
        select
          o.customer_id,
          (
            select sum(i.n_items_purchased)
            from have i
            where i.customer_id=o.customer_id and o.date between i.date-60 and i.date+60
          )
            as n_items_sum60
        from have as o
      )
    group by customer_id
    having max_n_items_sum60>40
  ;
quit;
Occasional Contributor
Posts: 15

Re: Proc sql: conditional programming

This should be working. I will update in a day or two.

 

thanks!

PROC Star
Posts: 269

Re: Proc sql: conditional programming

Here is a way to do it. Not sure how well it scales. If you have large amounts of data, a data step solution may be faster:

data have;
input customer_ID $/date date9./NoOfItems;
format date date9.;
cards;
2333A
24-May-07
3
2333A
27-Jun-07
5
2333A
8-Jul-07
3
2333A
9-Jul-07
8
2333A
20-Jul-07
9
2333A
25-Jun-07
9
2333A
30-Jun-07
7
2333A
5-Jul-07
6
2333A
7-Jul-07
1
5643D
24-May-07
5
5643D
27-Jun-07
5
5643D
8-Jul-07
5
5643D
9-Jul-07
5
5643D
20-Jul-07
5
5643D
25-Jun-07
6
5643D
30-Jun-07
7
5643D
5-Jul-07
5
5643D
7-Jul-07
1
5643D
18-May-07
2
5643D
20-Jun-07
2
2287C
25-Jun-07
4
2287C
30-Jun-07
2
2287C
5-Jul-07
3
;run;
proc sql;
  create table Sixty_sum
    as select Customer_id,date,(select sum(NoOfItems) from have
                                where Customer_id=outer.Customer_ID
                                  and date between outer.Date and outer.Date+59) as NoOfItems60
    from have as outer
    having calculated NoOfItems60>40
     ;

Super User
Posts: 6,903

Re: Proc sql: conditional programming

This solution represents an important recognition of the requirements.  You don't need to look backwards.  You only need to look forwards (date + 59).  The reason:  if looking backwards would find a set of days that totaled 40+ items, you would find that same set of days by looking forward only.  You would just find it based on an earlier observation for the same customer.

 

In a DATA step, you wouldn't need the full Cartesian product, although the programming becomes a bit messy.  If needed:

 

proc sort data=have;

by custid date;

run;

 

Then take advantage of the order to the observations.  (This does not necessarily mean that it runs faster than SQL ... only testing would reveal that):

 

data want (where=(bought_40_flag = 1));

set have nobs=n_obs;

by custid date;

if first.custid then do;

   total_items = n_items;

   if n_items < 40 then bought_40_flag = 0;

  else do;

      bought_40_flag=1;

      output;

   end;

end;

retain bought_40_flag;

if bought_40_flag=0 and _n_ < n_obs then do next_obs = _n_+1 to n_obs;

   set have (rename=(custid = next_custid date = next_date)) point=next_obs;

   if custid = next_custid and (next_date - date) < 60 then do;

      total_items + n_items;

      if total_items >= 40 then do;

         bought_40_flag=1;

         output;

         return;

      end;

   end;

   else return;  /* now at next customer or beyond 60 day limit */

end;

run;

 

Like I said, it's messy.  It's also untested, so I hope I got it right but it might need a little tweaking.

Ask a Question
Discussion stats
  • 12 replies
  • 267 views
  • 4 likes
  • 7 in conversation