Frequency in one year range

Accepted Solution Solved
Reply
SAS Employee
Posts: 20
Accepted Solution

Frequency in one year range

[ Edited ]

Hi everyone,

 

I have a data set consisting of Customer_id and date columns.

I want to calculate which customers existing 3 times in one year period?

 

I want to catch customers like this:

 

Customer_ID   Date

100                12/08/2014

100                30/01/2015

100                09/06/2015

 

(There may be many rows with different dates for one customer)

Thank you very much for your support!

 

Onur


Accepted Solutions
Solution
‎01-29-2016 03:36 AM
Trusted Advisor
Posts: 1,114

Re: Frequency in one year range

[ Edited ]

Try this:

proc sort data=have; /* only if necessary */
by customer_id date;
run;

data want;
do until(last.customer_id);
  set have;
  where n(date);
  by customer_id;
  n=sum(n,1);
  lag2d=lag2(date);
  if n<=2 then lag2d=.;
  else if intck('year', lag2d, date, 'c')<1 & ~flag then do;
    output;
    flag=1;
  end;
end;
keep customer_id;
run;

This assumes, that, for the time being, it's sufficient to select only the IDs of the qualifying customers.

 

Edit: Unlike @Haikuo's approach, the above suggestion includes duplicate dates in the count. Another difference to @Haikuo's solution is that dates with a difference of exactly 1 year, like 12/08/2014 and 12/08/2015, would not be regarded as being within one year.

 

View solution in original post


All Replies
Occasional Contributor
Posts: 18

Re: Frequency in one year range

Hi,

 

Could you please post sample data so that some assumtption can be made?

 

 

Esteemed Advisor
Esteemed Advisor
Posts: 7,189

Re: Frequency in one year range

First step would be to create a baseline for each by group, so customer_id:

proc sql;
  create table WANT as
  select  HAVE.*,
            BASE.DATE,
            HAVE.DATE - BASE.DATE / 365 as NUM_YEARS
  from    HAVE HAVE
  left join (select distinct CUSTOMER_ID,min(DATE) as DATE
               from   HAVE
               group by CUSTOMER_ID) BASE
  on       HAVE.CUSTOMER_ID=BASE.CUSTOMER_ID;
quit;

This will add the minimum date onto your data, and work out years based on 365 days per year.

From that you can then frequency on the NUM_YEARS variable.

SAS Employee
Posts: 20

Re: Frequency in one year range

Thank you for your answer, but what if I have a customer like this:

 

Customer_ID  Date            Gap with min(date)

101                10/04/2013   0

101                16/06/2013   0.17 years

101                22/08/2014   1.3 years

101                07/09/2014   1.4 years

101                28/03/2015   1.9 years

 

This customer is the one which I want to choose but how can I select (calculate the difference between all the 3-adjacent dates) for all customers like this from the entire data?

 

Your code always compares each date with the min date.

 

Many thanks

Esteemed Advisor
Esteemed Advisor
Posts: 7,189

Re: Frequency in one year range

Sorry, you will want a floor in there too:

proc sql;
  create table WANT as
  select  HAVE.*,
            BASE.DATE,
            floor(HAVE.DATE - BASE.DATE / 365) as NUM_YEARS
  from    HAVE HAVE
  left join (select distinct CUSTOMER_ID,min(DATE) as DATE
               from   HAVE
               group by CUSTOMER_ID) BASE
  on       HAVE.CUSTOMER_ID=BASE.CUSTOMER_ID;
quit;

So everything from min date to min date + 365 is year 0, from there + 365 is year 1 etc.  Then you can use that year variable to frequency on.  So what I am doing is blocking out each range of data based on its distance from the minimum.  So not sure I understand your question.  Post some more example data, and what the output should look like? 

Respected Advisor
Posts: 3,124

Re: Frequency in one year range

Data step may provide a better, more efficient solution, but SQL is really easy to code for this kind of rolling computation:

DATA HAVE;
	INPUT ID   Date:DDMMYY10.;
	CARDS;
100                30/01/2015
100                09/06/2015
100                09/06/2017
101                10/04/2013  
101                16/06/2013  
101                22/08/2014  
101                07/09/2014  
101                28/03/2015
;

PROC SQL;
	SELECT ID, (SELECT COUNT(DISTINCT DATE) FROM HAVE WHERE ID=A.ID AND DATE BETWEEN A.DATE AND INTNX('YEAR',A.DATE,1,'S')) AS CNT_DATE
		FROM HAVE A
			WHERE CALCULATED CNT_DATE>=3
	;
QUIT;
Solution
‎01-29-2016 03:36 AM
Trusted Advisor
Posts: 1,114

Re: Frequency in one year range

[ Edited ]

Try this:

proc sort data=have; /* only if necessary */
by customer_id date;
run;

data want;
do until(last.customer_id);
  set have;
  where n(date);
  by customer_id;
  n=sum(n,1);
  lag2d=lag2(date);
  if n<=2 then lag2d=.;
  else if intck('year', lag2d, date, 'c')<1 & ~flag then do;
    output;
    flag=1;
  end;
end;
keep customer_id;
run;

This assumes, that, for the time being, it's sufficient to select only the IDs of the qualifying customers.

 

Edit: Unlike @Haikuo's approach, the above suggestion includes duplicate dates in the count. Another difference to @Haikuo's solution is that dates with a difference of exactly 1 year, like 12/08/2014 and 12/08/2015, would not be regarded as being within one year.

 

Occasional Contributor
Posts: 12

Re: Frequency in one year range

depend on what you class as a "year".

A calendar year or 365 day period or 12months.
with 365 you could have overlapping periods which were valid.

Do you need to identify these distinct periods or are you only looking for a binary flag to indicate they do have 3 data points within a 1 calendar / 12 month / 365 day period?

Clear requirements Smiley Happy
Occasional Contributor
Posts: 12

Re: Frequency in one year range

[ Edited ]

Here is one possible solution (there are always more than one).

 

The input dataset is called "sample1"

"sample2" contains a row for each distinct customer_id and 3 dates that fall within a 365 day period. You will get multiple outputs for each customer if they have several groups of dates that fall within 365 days.

"sample3" is the reduced list of customer_id-s that trigger the above rule at least once.

 

Apologies if there are any syntax errors, I haven't run it to check it Smiley Wink

 

 

proc sort sample1 ;
by customer_id Date;
run;


data sample2 (drop= date);
set sample1;
by customer_id;
format date1 date2 date3 date9.;
retain date1 date2 date3;
if first.customer_id then
do;
date1 = date;
date2 = .;
date3 = .;
end;
else do;
date3 = date2;
date2 = date1;
date1 = date;
end;
if date1 ne . and date2 ne . and date3 ne .
and intck('days',date1,date3) <= 365 then output;
run;

proc sql;
create table sample3 as
select distinct customer_id
from sample2
;
quit;

SAS Employee
Posts: 20

Re: Frequency in one year range

Thank you very much!! This code worked only in 1 minute for 700K rows

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 633 views
  • 1 like
  • 6 in conversation