BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
EinarRoed
Pyrite | Level 9

 

Hello, I'd appreciate your advice on this problem.
 
Here's a simplified data set which displays all the policies of one customer:
  
customer_id policy_id policy_effective_dt policy_expiration_dt active_policy_flag
127055 135418 27.04.2000 27.11.2004 0
127055 642156 27.11.2002 20.10.2003 0
127055 548874 20.05.2011 21.03.2013 0
127055 561321 20.05.2011 20.03.2016 0
127055 854876 25.04.2012 08.04.2013 0
127055 231554 12.07.2012 03.08.2013 0
127055 564498 13.08.2013 08.05.2015 0
127055 548879 16.06.2016 15.06.2017 1
 
I need to select the date in which a person became a new customer. However there's a special rule to bear in mind, so I cannot simply use min(policy_effective_dt).
 
The rule says that if an existing customer cancels all his policies, and signs a new policy over 3 months later, then he should be considered a new customer. If however he signs a new policy within 3 months (of his former policies having expired), then he's not considered a new customer.
 
Based on this rule, the customer in the example data became a new customer on 20.05.2011. So that's the date I need to select. The reasons why 20.05.2011 is the correct date are:
 
- The policies on the top two rows expired many years before the customer signed new policies (not within 3 months).
- The only active policy has the effective date 16.06.2016. However this is within 3 months of the former policy that was canceled (564498, 08.05.2015), and that policy's effective-date overlaps with earlier policies, with the earliest effective date being 20.05.2011.
 

 Thoughts on how to handle it so far

1. Select the min(policy_effective_dt) of active policies (16.06.2016)

2. Establish a new variable that's loaded with the above value minus the 3 month period (16.03.2016)

3. Check each row in the policy_expiration_dt column to see if any date is higher than the variable from step

     - If no, then 16.03.2016 is selected

     - If yes (which is the case) then select that rows policy_effective_from_dt, subtract 3 months, and restart step 3 until there's no longer any new matches.

 

Do you have any suggestion on how to write this kind of code? I mostly just work in EG & DIS, so my SAS coding skills are very limited.

 

Thanks for your time!

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

OK. If I know what you are talking about .

 

 

data have;
infile cards expandtabs truncover;
input customer_id	policy_id	policy_effective_dt : ddmmyy10.	policy_expiration_dt : ddmmyy10.;
format policy_effective_dt policy_expiration_dt ddmmyyp10.;
cards;
127055	135418	27.04.2000	27.11.2004	0
127055	642156	27.11.2002	20.10.2003	0
127055	548874	20.05.2011	21.03.2013	0
127055	561321	20.05.2011	20.03.2016	0
127055	854876	25.04.2012	08.04.2013	0
127055	231554	12.07.2012	03.08.2013	0
127055	564498	13.08.2013	08.05.2015	0
127055	548879	16.06.2016	15.06.2017
;
run;
data want;
array x{9999} _temporary_;

 do i=1 by 1 until(last.customer_id);
  set have;
  by customer_id;
  x{i}=policy_expiration_dt;
 end;
max_date=.; 
 do j=1 by 1 until(last.customer_id);
  set have;
  by customer_id;
  new_customer=0;
  if intck('month',max_date,policy_effective_dt,'c') gt 3 then new_customer=1;
  max_date=max(max_date,x{j}); 
  output;
 end;
 
drop i j  max_date;
run;

View solution in original post

3 REPLIES 3
Kurt_Bremser
Super User

Try this:

data have;
input
  customer_id
  policy_id
  policy_effective_dt :ddmmyy10.
  policy_expiration_dt :ddmmyy10.
  active_policy_flag
;
format 
  policy_effective_dt
  policy_expiration_dt
    ddmmyyp10.
;
cards;
127055 135418 27.04.2000 27.11.2004 0
127055 642156 27.11.2002 20.10.2003 0
127055 548874 20.05.2011 21.03.2013 0
127055 561321 20.05.2011 20.03.2016 0
127055 854876 25.04.2012 08.04.2013 0
127055 231554 12.07.2012 03.08.2013 0
127055 564498 13.08.2013 08.05.2015 0
127055 548879 16.06.2016 15.06.2017 1
;
run;

proc sort data=have;
by customer_id policy_effective_dt;
run;

data ranges (keep=customer_id range_start range_end);
set have;
by customer_id;
retain range_start range_end;
format range_start range_end ddmmyyp10.;
if first.customer_id
then do;
  range_start = policy_effective_dt;
  range_end = policy_expiration_dt;
end;
else do;
  if policy_effective_dt <= intnx('month',range_end,3)
  then do;
    range_end = max(policy_expiration_dt,range_end);
    if last.customer_id then output;
  end;
  else do;
    output;
    range_start = policy_effective_dt;
    range_end = policy_expiration_dt;
  end;
end;
run;

data want;
merge
  have (in=a)
  ranges (
    in=b
    keep=customer_id range_start
    rename=(range_start=policy_effective_dt)
  )
;
by customer_id policy_effective_dt;
if a;
if b
then new_customer = 1;
else new_customer = 0;
run;
Ksharp
Super User

OK. If I know what you are talking about .

 

 

data have;
infile cards expandtabs truncover;
input customer_id	policy_id	policy_effective_dt : ddmmyy10.	policy_expiration_dt : ddmmyy10.;
format policy_effective_dt policy_expiration_dt ddmmyyp10.;
cards;
127055	135418	27.04.2000	27.11.2004	0
127055	642156	27.11.2002	20.10.2003	0
127055	548874	20.05.2011	21.03.2013	0
127055	561321	20.05.2011	20.03.2016	0
127055	854876	25.04.2012	08.04.2013	0
127055	231554	12.07.2012	03.08.2013	0
127055	564498	13.08.2013	08.05.2015	0
127055	548879	16.06.2016	15.06.2017
;
run;
data want;
array x{9999} _temporary_;

 do i=1 by 1 until(last.customer_id);
  set have;
  by customer_id;
  x{i}=policy_expiration_dt;
 end;
max_date=.; 
 do j=1 by 1 until(last.customer_id);
  set have;
  by customer_id;
  new_customer=0;
  if intck('month',max_date,policy_effective_dt,'c') gt 3 then new_customer=1;
  max_date=max(max_date,x{j}); 
  output;
 end;
 
drop i j  max_date;
run;
EinarRoed
Pyrite | Level 9

Thank you very much, @Ksharp & @Kurt_Bremser. I'm in the process of testing the logic for a much larger dataset, and it's looking very good so far. 🙂

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 803 views
  • 2 likes
  • 3 in conversation