DATA Step, Macro, Functions and more

Selecting "customer start date" based on a specific rule

Reply
Frequent Contributor
Posts: 90

Selecting "customer start date" based on a specific rule

[ Edited ]

 

Hello, I'd appreciate your advice on this problem.
 
Here's a simplified data set which displays all the policies of one customer:
  
customer_id policy_id policy_effective_dt policy_expiration_dt active_policy_flag
127055 135418 27.04.2000 27.11.2004 0
127055 642156 27.11.2002 20.10.2003 0
127055 548874 20.05.2011 21.03.2013 0
127055 561321 20.05.2011 20.03.2016 0
127055 854876 25.04.2012 08.04.2013 0
127055 231554 12.07.2012 03.08.2013 0
127055 564498 13.08.2013 08.05.2015 0
127055 548879 16.06.2016 15.06.2017 1
 
I need to select the date in which a person became a new customer. However there's a special rule to bear in mind, so I cannot simply use min(policy_effective_dt).
 
The rule says that if an existing customer cancels all his policies, and signs a new policy over 3 months later, then he should be considered a new customer. If however he signs a new policy within 3 months (of his former policies having expired), then he's not considered a new customer.
 
Based on this rule, the customer in the example data became a new customer on 20.05.2011. So that's the date I need to select. The reasons why 20.05.2011 is the correct date are:
 
- The policies on the top two rows expired many years before the customer signed new policies (not within 3 months).
- The only active policy has the effective date 16.06.2016. However this is within 3 months of the former policy that was canceled (564498, 08.05.2015), and that policy's effective-date overlaps with earlier policies, with the earliest effective date being 20.05.2011.
 

 Thoughts on how to handle it so far

1. Select the min(policy_effective_dt) of active policies (16.06.2016)

2. Establish a new variable that's loaded with the above value minus the 3 month period (16.03.2016)

3. Check each row in the policy_expiration_dt column to see if any date is higher than the variable from step

     - If no, then 16.03.2016 is selected

     - If yes (which is the case) then select that rows policy_effective_from_dt, subtract 3 months, and restart step 3 until there's no longer any new matches.

 

Do you have any suggestion on how to write this kind of code? I mostly just work in EG & DIS, so my SAS coding skills are very limited.

 

Thanks for your time!

 

Super User
Posts: 7,854

Re: Selecting "customer start date" based on a specific rule

Posted in reply to EinarRoed

Try this:

data have;
input
  customer_id
  policy_id
  policy_effective_dt :ddmmyy10.
  policy_expiration_dt :ddmmyy10.
  active_policy_flag
;
format 
  policy_effective_dt
  policy_expiration_dt
    ddmmyyp10.
;
cards;
127055 135418 27.04.2000 27.11.2004 0
127055 642156 27.11.2002 20.10.2003 0
127055 548874 20.05.2011 21.03.2013 0
127055 561321 20.05.2011 20.03.2016 0
127055 854876 25.04.2012 08.04.2013 0
127055 231554 12.07.2012 03.08.2013 0
127055 564498 13.08.2013 08.05.2015 0
127055 548879 16.06.2016 15.06.2017 1
;
run;

proc sort data=have;
by customer_id policy_effective_dt;
run;

data ranges (keep=customer_id range_start range_end);
set have;
by customer_id;
retain range_start range_end;
format range_start range_end ddmmyyp10.;
if first.customer_id
then do;
  range_start = policy_effective_dt;
  range_end = policy_expiration_dt;
end;
else do;
  if policy_effective_dt <= intnx('month',range_end,3)
  then do;
    range_end = max(policy_expiration_dt,range_end);
    if last.customer_id then output;
  end;
  else do;
    output;
    range_start = policy_effective_dt;
    range_end = policy_expiration_dt;
  end;
end;
run;

data want;
merge
  have (in=a)
  ranges (
    in=b
    keep=customer_id range_start
    rename=(range_start=policy_effective_dt)
  )
;
by customer_id policy_effective_dt;
if a;
if b
then new_customer = 1;
else new_customer = 0;
run;
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Posts: 10,044

Re: Selecting "customer start date" based on a specific rule

Posted in reply to EinarRoed

OK. If I know what you are talking about .

 

 

data have;
infile cards expandtabs truncover;
input customer_id	policy_id	policy_effective_dt : ddmmyy10.	policy_expiration_dt : ddmmyy10.;
format policy_effective_dt policy_expiration_dt ddmmyyp10.;
cards;
127055	135418	27.04.2000	27.11.2004	0
127055	642156	27.11.2002	20.10.2003	0
127055	548874	20.05.2011	21.03.2013	0
127055	561321	20.05.2011	20.03.2016	0
127055	854876	25.04.2012	08.04.2013	0
127055	231554	12.07.2012	03.08.2013	0
127055	564498	13.08.2013	08.05.2015	0
127055	548879	16.06.2016	15.06.2017
;
run;
data want;
array x{9999} _temporary_;

 do i=1 by 1 until(last.customer_id);
  set have;
  by customer_id;
  x{i}=policy_expiration_dt;
 end;
max_date=.; 
 do j=1 by 1 until(last.customer_id);
  set have;
  by customer_id;
  new_customer=0;
  if intck('month',max_date,policy_effective_dt,'c') gt 3 then new_customer=1;
  max_date=max(max_date,x{j}); 
  output;
 end;
 
drop i j  max_date;
run;
Frequent Contributor
Posts: 90

Re: Selecting "customer start date" based on a specific rule

Thank you very much, @Ksharp & @KurtBremser. I'm in the process of testing the logic for a much larger dataset, and it's looking very good so far. Smiley Happy

Ask a Question
Discussion stats
  • 3 replies
  • 218 views
  • 2 likes
  • 3 in conversation