## Selecting "customer start date" based on a specific rule

Frequent Contributor
Posts: 100

# Selecting "customer start date" based on a specific rule

[ Edited ]

Here's a simplified data set which displays all the policies of one customer:

 customer_id policy_id policy_effective_dt policy_expiration_dt active_policy_flag 127055 135418 27.04.2000 27.11.2004 0 127055 642156 27.11.2002 20.10.2003 0 127055 548874 20.05.2011 21.03.2013 0 127055 561321 20.05.2011 20.03.2016 0 127055 854876 25.04.2012 08.04.2013 0 127055 231554 12.07.2012 03.08.2013 0 127055 564498 13.08.2013 08.05.2015 0 127055 548879 16.06.2016 15.06.2017 1

I need to select the date in which a person became a new customer. However there's a special rule to bear in mind, so I cannot simply use min(policy_effective_dt).

The rule says that if an existing customer cancels all his policies, and signs a new policy over 3 months later, then he should be considered a new customer. If however he signs a new policy within 3 months (of his former policies having expired), then he's not considered a new customer.

Based on this rule, the customer in the example data became a new customer on 20.05.2011. So that's the date I need to select. The reasons why 20.05.2011 is the correct date are:

- The policies on the top two rows expired many years before the customer signed new policies (not within 3 months).
- The only active policy has the effective date 16.06.2016. However this is within 3 months of the former policy that was canceled (564498, 08.05.2015), and that policy's effective-date overlaps with earlier policies, with the earliest effective date being 20.05.2011.

Thoughts on how to handle it so far

1. Select the min(policy_effective_dt) of active policies (16.06.2016)

2. Establish a new variable that's loaded with the above value minus the 3 month period (16.03.2016)

3. Check each row in the policy_expiration_dt column to see if any date is higher than the variable from step

- If no, then 16.03.2016 is selected

- If yes (which is the case) then select that rows policy_effective_from_dt, subtract 3 months, and restart step 3 until there's no longer any new matches.

Do you have any suggestion on how to write this kind of code? I mostly just work in EG & DIS, so my SAS coding skills are very limited.

Super User
Posts: 10,211

## Re: Selecting "customer start date" based on a specific rule

Try this:

``````data have;
input
customer_id
policy_id
policy_effective_dt :ddmmyy10.
policy_expiration_dt :ddmmyy10.
active_policy_flag
;
format
policy_effective_dt
policy_expiration_dt
ddmmyyp10.
;
cards;
127055 135418 27.04.2000 27.11.2004 0
127055 642156 27.11.2002 20.10.2003 0
127055 548874 20.05.2011 21.03.2013 0
127055 561321 20.05.2011 20.03.2016 0
127055 854876 25.04.2012 08.04.2013 0
127055 231554 12.07.2012 03.08.2013 0
127055 564498 13.08.2013 08.05.2015 0
127055 548879 16.06.2016 15.06.2017 1
;
run;

proc sort data=have;
by customer_id policy_effective_dt;
run;

data ranges (keep=customer_id range_start range_end);
set have;
by customer_id;
retain range_start range_end;
format range_start range_end ddmmyyp10.;
if first.customer_id
then do;
range_start = policy_effective_dt;
range_end = policy_expiration_dt;
end;
else do;
if policy_effective_dt <= intnx('month',range_end,3)
then do;
range_end = max(policy_expiration_dt,range_end);
if last.customer_id then output;
end;
else do;
output;
range_start = policy_effective_dt;
range_end = policy_expiration_dt;
end;
end;
run;

data want;
merge
have (in=a)
ranges (
in=b
keep=customer_id range_start
rename=(range_start=policy_effective_dt)
)
;
by customer_id policy_effective_dt;
if a;
if b
then new_customer = 1;
else new_customer = 0;
run;
``````
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Super User
Posts: 10,766

## Re: Selecting "customer start date" based on a specific rule

OK. If I know what you are talking about .

``````data have;
infile cards expandtabs truncover;
input customer_id	policy_id	policy_effective_dt : ddmmyy10.	policy_expiration_dt : ddmmyy10.;
format policy_effective_dt policy_expiration_dt ddmmyyp10.;
cards;
127055	135418	27.04.2000	27.11.2004	0
127055	642156	27.11.2002	20.10.2003	0
127055	548874	20.05.2011	21.03.2013	0
127055	561321	20.05.2011	20.03.2016	0
127055	854876	25.04.2012	08.04.2013	0
127055	231554	12.07.2012	03.08.2013	0
127055	564498	13.08.2013	08.05.2015	0
127055	548879	16.06.2016	15.06.2017
;
run;
data want;
array x{9999} _temporary_;

do i=1 by 1 until(last.customer_id);
set have;
by customer_id;
x{i}=policy_expiration_dt;
end;
max_date=.;
do j=1 by 1 until(last.customer_id);
set have;
by customer_id;
new_customer=0;
if intck('month',max_date,policy_effective_dt,'c') gt 3 then new_customer=1;
max_date=max(max_date,x{j});
output;
end;

drop i j  max_date;
run;``````
Frequent Contributor
Posts: 100

## Re: Selecting "customer start date" based on a specific rule

Thank you very much, @Ksharp & @KurtBremser. I'm in the process of testing the logic for a much larger dataset, and it's looking very good so far.

Discussion stats
• 3 replies
• 264 views
• 2 likes
• 3 in conversation