I have a sorted list of policies. Each time a policy is changed, a new policy version is created.
The policy versions can indicate the policy being cancelled. But policies can be 'saved' and later cancelled again.
I want a new variable that counts each new cancellation. It should only count the first rows with cancelled=1 after having had cancelled=0.
Here's an example with 4 policies. The 4th column is what I want:
| Policy | Policy version | Cancelled | Want |
| 123 | 1 | 0 | |
| 123 | 2 | 1 | 1 |
| 123 | 3 | 0 | |
| 123 | 4 | 0 | |
| 123 | 5 | 0 | |
| 123 | 6 | 1 | 2 |
| 123 | 7 | 1 | |
| 157 | 1 | 0 | |
| 157 | 2 | 1 | 1 |
| 157 | 3 | 1 | |
| 157 | 4 | 1 | |
| 157 | 5 | 0 | |
| 157 | 6 | 1 | 2 |
| 157 | 7 | 0 | |
| 188 | 1 | 0 | |
| 188 | 2 | 1 | 1 |
| 997 | 1 | 0 | |
| 997 | 2 | 1 | 1 |
| 997 | 3 | 0 | |
| 997 | 4 | 1 | 2 |
| 997 | 5 | 0 | |
| 997 | 6 | 0 | |
| 997 | 7 | 1 | 3 |
| 997 | 8 | 1 |
Can someone please advice me on how to create the 4th column in a data step?
Hello @EinarRoed,
Try this:
data have;
input policy version cancelled;
cards;
123 1 0
123 2 1
123 3 0
123 4 0
123 5 0
123 6 1
123 7 1
157 1 0
157 2 1
157 3 1
157 4 1
157 5 0
157 6 1
157 7 0
188 1 0
188 2 1
997 1 0
997 2 1
997 3 0
997 4 1
997 5 0
997 6 0
997 7 1
997 8 1
;
data want(drop=_:);
set have;
by policy version;
_i=(dif(cancelled)>0);
_s+_i;
if first.policy then _s=0;
want=ifn(first.policy | ~_i,.,_s);
run;
Hello @EinarRoed,
Try this:
data have;
input policy version cancelled;
cards;
123 1 0
123 2 1
123 3 0
123 4 0
123 5 0
123 6 1
123 7 1
157 1 0
157 2 1
157 3 1
157 4 1
157 5 0
157 6 1
157 7 0
188 1 0
188 2 1
997 1 0
997 2 1
997 3 0
997 4 1
997 5 0
997 6 0
997 7 1
997 8 1
;
data want(drop=_:);
set have;
by policy version;
_i=(dif(cancelled)>0);
_s+_i;
if first.policy then _s=0;
want=ifn(first.policy | ~_i,.,_s);
run;
@FreelanceReinh : that ifn is for increasing job-security, isn't it 😉
@andreas_lds wrote:
@FreelanceReinh : that ifn is for increasing job-security, isn't it 😉
I don't think it's too cryptic (if that's what you mean). Variable WANT is to be set to missing if the observation is the first in the POLICY BY-group or there isn't an increase of CANCELLED. Otherwise it is assigned the value of _S, the cumulative count of increases per BY-group.
Hi,
Try this, want2 should be your desired output.
data have;
input policy version cancelled;
cards;
123 1 0
123 2 1
123 3 0
123 4 0
123 5 0
123 6 1
123 7 1
157 1 0
157 2 1
157 3 1
157 4 1
157 5 0
157 6 1
157 7 0
188 1 0
188 2 1
997 1 0
997 2 1
997 3 0
997 4 1
997 5 0
997 6 0
997 7 1
997 8 1
;
data want;
set have;
by policy;
if first.policy then do;
count=0;
end;
if not last.policy and cancelled=1 then count+1;
if last.policy and cancelled=1 then count+1;
else count=count;
run;
data want2;
set want;
if cancelled=0 then count=0;
run;
data have;
input policy version cancelled;
cards;
123 1 0
123 2 1
123 3 0
123 4 0
123 5 0
123 6 1
123 7 1
157 1 0
157 2 1
157 3 1
157 4 1
157 5 0
157 6 1
157 7 0
188 1 0
188 2 1
997 1 0
997 2 1
997 3 0
997 4 1
997 5 0
997 6 0
997 7 1
997 8 1
;
data want;
set have;
by policy cancelled notsorted;
if first.policy then temp=0;
if first.cancelled and cancelled=1 then do;temp+1;want=temp;end;
drop temp;
run;
The first suggestion, which was tagged as the solution, worked well in all cases except for one.
When the input consisted of 1 single policy row with Cancelled=1, then Want was NULL.
This scenario wasn't covered by my example scenarios though, as I didn't predict it could happen.
In any case, Ksharp's suggestion covers this scenario as well.
Thanks for all the feedback!
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.