I have a sorted list of policies. Each time a policy is changed, a new policy version is created.
The policy versions can indicate the policy being cancelled. But policies can be 'saved' and later cancelled again.
I want a new variable that counts each new cancellation. It should only count the first rows with cancelled=1 after having had cancelled=0.
Here's an example with 4 policies. The 4th column is what I want:
Policy | Policy version | Cancelled | Want |
123 | 1 | 0 | |
123 | 2 | 1 | 1 |
123 | 3 | 0 | |
123 | 4 | 0 | |
123 | 5 | 0 | |
123 | 6 | 1 | 2 |
123 | 7 | 1 | |
157 | 1 | 0 | |
157 | 2 | 1 | 1 |
157 | 3 | 1 | |
157 | 4 | 1 | |
157 | 5 | 0 | |
157 | 6 | 1 | 2 |
157 | 7 | 0 | |
188 | 1 | 0 | |
188 | 2 | 1 | 1 |
997 | 1 | 0 | |
997 | 2 | 1 | 1 |
997 | 3 | 0 | |
997 | 4 | 1 | 2 |
997 | 5 | 0 | |
997 | 6 | 0 | |
997 | 7 | 1 | 3 |
997 | 8 | 1 |
Can someone please advice me on how to create the 4th column in a data step?
Hello @EinarRoed,
Try this:
data have;
input policy version cancelled;
cards;
123 1 0
123 2 1
123 3 0
123 4 0
123 5 0
123 6 1
123 7 1
157 1 0
157 2 1
157 3 1
157 4 1
157 5 0
157 6 1
157 7 0
188 1 0
188 2 1
997 1 0
997 2 1
997 3 0
997 4 1
997 5 0
997 6 0
997 7 1
997 8 1
;
data want(drop=_:);
set have;
by policy version;
_i=(dif(cancelled)>0);
_s+_i;
if first.policy then _s=0;
want=ifn(first.policy | ~_i,.,_s);
run;
Hello @EinarRoed,
Try this:
data have;
input policy version cancelled;
cards;
123 1 0
123 2 1
123 3 0
123 4 0
123 5 0
123 6 1
123 7 1
157 1 0
157 2 1
157 3 1
157 4 1
157 5 0
157 6 1
157 7 0
188 1 0
188 2 1
997 1 0
997 2 1
997 3 0
997 4 1
997 5 0
997 6 0
997 7 1
997 8 1
;
data want(drop=_:);
set have;
by policy version;
_i=(dif(cancelled)>0);
_s+_i;
if first.policy then _s=0;
want=ifn(first.policy | ~_i,.,_s);
run;
@FreelanceReinh : that ifn is for increasing job-security, isn't it 😉
@andreas_lds wrote:
@FreelanceReinh : that ifn is for increasing job-security, isn't it 😉
I don't think it's too cryptic (if that's what you mean). Variable WANT is to be set to missing if the observation is the first in the POLICY BY-group or there isn't an increase of CANCELLED. Otherwise it is assigned the value of _S, the cumulative count of increases per BY-group.
Hi,
Try this, want2 should be your desired output.
data have;
input policy version cancelled;
cards;
123 1 0
123 2 1
123 3 0
123 4 0
123 5 0
123 6 1
123 7 1
157 1 0
157 2 1
157 3 1
157 4 1
157 5 0
157 6 1
157 7 0
188 1 0
188 2 1
997 1 0
997 2 1
997 3 0
997 4 1
997 5 0
997 6 0
997 7 1
997 8 1
;
data want;
set have;
by policy;
if first.policy then do;
count=0;
end;
if not last.policy and cancelled=1 then count+1;
if last.policy and cancelled=1 then count+1;
else count=count;
run;
data want2;
set want;
if cancelled=0 then count=0;
run;
data have;
input policy version cancelled;
cards;
123 1 0
123 2 1
123 3 0
123 4 0
123 5 0
123 6 1
123 7 1
157 1 0
157 2 1
157 3 1
157 4 1
157 5 0
157 6 1
157 7 0
188 1 0
188 2 1
997 1 0
997 2 1
997 3 0
997 4 1
997 5 0
997 6 0
997 7 1
997 8 1
;
data want;
set have;
by policy cancelled notsorted;
if first.policy then temp=0;
if first.cancelled and cancelled=1 then do;temp+1;want=temp;end;
drop temp;
run;
The first suggestion, which was tagged as the solution, worked well in all cases except for one.
When the input consisted of 1 single policy row with Cancelled=1, then Want was NULL.
This scenario wasn't covered by my example scenarios though, as I didn't predict it could happen.
In any case, Ksharp's suggestion covers this scenario as well.
Thanks for all the feedback!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.