I have a claims dataset that looks like this:
person_id claim_date claim_id payment
1 01/01/2019 100 50
1 01/03/2019 101 40
1 02/09/2019 102 0
1 12/20/2019 103 0
2 10/20/2019 201 50
2 10/21/2019 202 40
2 11/30/2019 203 30
3 04/02/2019 301 20
3 04/05/2019 302 20
3 05/20/2019 304 0
3 12/30/2019 305 0
3 12/31/2019 306 50
Note that there is only one claim allowed per date in this dataset, the dataset is already sorted by claim_date, and the dataset only contains claims from 2019.
The goal is to identify the first claim_date, if any, for which both of the following are true: 1) payment equals 0; and 2) all subsequent observations for the person_id also had payment equal to 0. So for person_id = 1, the date would be 02/09/2019; for person_id = 2, no such date exists because all of their claims had non-zero payments; and for person_id = 3, no such date exists because of the positive payment on 12/31/2019.
It's straightforward to identify the first claim-date with payment = 0 for each person_id (the first criteria), but I'm struggling with how to operationalize the second criteria. I'd appreciate any help the community can provide.
HI @chuakp Please post the expected output for the sample too, ty! I mean in other words what the output should look like?
How about:
data want;
set have;
by person_id claim_date;
if first.person_id or not missing(payment) then first_0 = .;
retain first_0;
if first_0=. and payment = 0 then first_0 = claim_date;
keep person_id first_0;
if last.person_id and not missing(first_0);
run;
It's untested, but looks about right.
Thank you. I believe I have to add this code
if payment NE 0 then first_0 = .;
This way, the program finds the first instance of a payment = 0 for a person_id. The program looks at the next observation and determines if payment = 0. If so, then first_0 stays the same. If not, that means that first_0 cannot be the first date in 2019 in which all subsequent claims have zero payments, so we set first_0 back to missing. Without this piece of code, I believe the program would just output the last observation in which payment equaled zero; if there is no such observation, first_0 will be missing.
data want;
set have;
by person_id claim_date;
if first.person_id or not missing(payment) then first_0 = .;
retain first_0;
if first_0=. and payment = 0 then first_0 = claim_date;
if payment NE 0 then first_0 = .;
keep person_id first_0;
if last.person_id and not missing(first_0);
run;
You're right. I was thinking that, but wrote it incorrectly here:
if first.person_id or not missing(payment) then first_0 = .;
Should have been:
if first.person_id or payment ne 0 then first_0 = .;
1. Assuming you want a filtered output to subset only the record that satisfies the stated condition .
You could:
data have;
input person_id claim_date :mmddyy10. claim_id payment;
format claim_date mmddyy10. ;
cards;
1 01/01/2019 100 50
1 01/03/2019 101 40
1 02/09/2019 102 0
1 12/20/2019 103 0
2 10/20/2019 201 50
2 10/21/2019 202 40
2 11/30/2019 203 30
3 04/02/2019 301 20
3 04/05/2019 302 20
3 05/20/2019 304 0
3 12/30/2019 305 0
3 12/31/2019 306 50
;
data want;
do _n_=1 by 1until(last.person_id);
set have;
by person_id;
if t1 then do;
t2=sumabs(payment,t2);
continue;
end;
if payment=0 then t1=claim_date;
end;
do _n_=1 to _n_;
set have;
by person_id;
if not t2 and claim_date=t1 then output;
end;
drop t:;
run;
2. Assuming you want a FLAG that identified that particular record as 1 and other as 0's
data want;
do _n_=1 by 1until(last.person_id);
set have;
by person_id;
if t1 then do;
t2=sumabs(payment,t2);
continue;
end;
if payment=0 then t1=claim_date;
end;
do _n_=1 to _n_;
set have;
by person_id;
Flag= not t2 and claim_date=t1;
output;
end;
drop t:;
run;
If both the assumptions of your expected output isn't what it is, you posting the expected output will greatly help. Ty!
Thanks - I tried this, but it didn't quite work. To answer your question, I am looking for an output dataset with one row per person_id, a column for person_id, and a column for the first date on which all claims had zero payment and on which all subsequent claims had zero payment.
@chuakp wrote:
Thanks - I tried this, but it didn't quite work. To answer your question, I am looking for an output dataset with one row per person_id, a column for person_id, and a column for the first date on which all claims had zero payment and on which all subsequent claims had zero payment.
data have;
input person_id claim_date :mmddyy10. claim_id payment;
format claim_date mmddyy10.;
cards;
1 01/01/2019 100 50
1 01/03/2019 101 40
1 02/09/2019 102 0
1 12/20/2019 103 0
2 10/20/2019 201 50
2 10/21/2019 202 40
2 11/30/2019 203 30
3 04/02/2019 301 20
3 04/05/2019 302 20
3 05/20/2019 304 0
3 12/30/2019 305 0
3 12/31/2019 306 50
;
proc sql;
/* create table want as*/
select l.person_id, min(l.claim_date) as claim_date format=mmddyy10.
from have l
where
payment=0
and not exists
( select *
from have i
where
i.person_id=l.person_id
and i.payment>0
and i.claim_date > l.claim_date
)
group by 1
;
quit;
You have to go through each person_id twice - once to review only the non-zero payments, keeping the last date of this subgroup (last_non_zero). The second time reads ALL the records, setting DUMMY=1 only for the case in which current payment=0 and the single record lag of date matches last_non_zero:
data want (drop=_:);
set have (where=(payment^=0) in=non_zero_payment)
have (in=second_pass);
by person_id;
if non_zero_payment=1 then _last_non_zero=claim_date;
else if first.person_id then _last_non_zero=.;
retain _last_non_zero;
if second_pass;
dummy=ifn(payment=0 and lag(claim_date)=_last_non_zero,1,0);
run;
Edited addition: This program assume data are sorted by person_id/claim_date, but it purposely only used "BY PERSON_ID" in the by-statement.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.