BookmarkSubscribeRSS Feed
chuakp
Obsidian | Level 7

I have a claims dataset that looks like this:

 

 

person_id   claim_date    claim_id    payment

1                 01/01/2019   100            50

1                 01/03/2019   101            40

1                 02/09/2019   102            0

1                 12/20/2019   103            0

2                 10/20/2019   201            50

2                 10/21/2019   202            40

2                 11/30/2019   203            30

3                 04/02/2019   301            20

3                 04/05/2019   302            20

3                 05/20/2019   304            0

3                 12/30/2019   305            0

3                 12/31/2019   306            50

 

Note that there is only one claim allowed per date in this dataset, the dataset is already sorted by claim_date, and the dataset only contains claims from 2019.

 

The goal is to identify the first claim_date, if any, for which both of the following are true: 1) payment equals 0; and 2) all subsequent observations for the person_id also had payment equal to 0. So for person_id = 1, the date would be 02/09/2019; for person_id = 2, no such date exists because all of their claims had non-zero payments; and for person_id = 3, no such date exists because of the positive payment on 12/31/2019.

 

It's straightforward to identify the first claim-date with payment = 0 for each person_id (the first criteria), but I'm struggling with how to operationalize the second criteria. I'd appreciate any help the community can provide. 

 

 

 

8 REPLIES 8
novinosrin
Tourmaline | Level 20

HI @chuakp  Please post the expected output for the sample too, ty! I mean in other words what the output should look like?

Astounding
PROC Star

How about:

 

data want;
   set have;
   by person_id claim_date;
   if first.person_id or not missing(payment) then first_0 = .;
   retain first_0;
   if first_0=. and payment = 0 then first_0 = claim_date;
   keep person_id first_0;
   if last.person_id and not missing(first_0);
run;

It's untested, but looks about right.

chuakp
Obsidian | Level 7

Thank you. I believe I have to add this code

if payment NE 0 then first_0 = .;

This way, the program finds the first instance of a payment = 0 for a person_id. The program looks at the next observation and determines if payment = 0. If so, then first_0 stays the same. If not, that means that first_0 cannot be the first date in 2019 in which all subsequent claims have zero payments, so we set first_0 back to missing. Without this piece of code, I believe the program would just output the last observation in which payment equaled zero; if there is no such observation, first_0 will be missing.

 

 

 

data want;
   set have;
   by person_id claim_date;
   if first.person_id or not missing(payment) then first_0 = .;
   retain first_0;
   if first_0=. and payment = 0 then first_0 = claim_date;
if payment NE 0 then first_0 = .;
keep person_id first_0; if last.person_id and not missing(first_0); run;

 

Astounding
PROC Star

You're right.  I was thinking that, but wrote it incorrectly here:

 

   if first.person_id or not missing(payment) then first_0 = .;

Should have been:

   if first.person_id or payment ne 0 then first_0 = .;
novinosrin
Tourmaline | Level 20

1. Assuming you want a filtered output to subset only the record that satisfies the stated condition .

 

You could:


data have;
input person_id   claim_date :mmddyy10.    claim_id    payment;
format  claim_date mmddyy10. ;
cards;
1                 01/01/2019   100            50
1                 01/03/2019   101            40
1                 02/09/2019   102            0
1                 12/20/2019   103            0
2                 10/20/2019   201            50
2                 10/21/2019   202            40
2                 11/30/2019   203            30
3                 04/02/2019   301            20
3                 04/05/2019   302            20
3                 05/20/2019   304            0
3                 12/30/2019   305            0
3                 12/31/2019   306            50
;
data want;
 do _n_=1 by 1until(last.person_id);
  set have;
  by person_id;
  if t1 then do; 
   t2=sumabs(payment,t2);
   continue;
  end;
  if payment=0 then t1=claim_date;
 end;
 do _n_=1 to _n_;
  set have;
  by person_id;
  if not t2 and claim_date=t1 then output;
 end;
 drop t:;
run;

2. Assuming you want a FLAG that identified that particular record as 1 and other as 0's

data want;
 do _n_=1 by 1until(last.person_id);
  set have;
  by person_id;
  if t1 then do; 
   t2=sumabs(payment,t2);
   continue;
  end;
  if payment=0 then t1=claim_date;
 end;
 do _n_=1 to _n_;
  set have;
  by person_id;
  Flag= not t2 and claim_date=t1;
  output;
 end;
 drop t:;
run;

If both the assumptions of your expected output isn't what it is, you posting the expected output will greatly help. Ty!

 

chuakp
Obsidian | Level 7

Thanks - I tried this, but it didn't quite work. To answer your question, I am looking for an output dataset with one row per person_id, a column for person_id, and a column for the first date on which all claims had zero payment and on which all subsequent claims had zero payment.

Patrick
Opal | Level 21

@chuakp wrote:

Thanks - I tried this, but it didn't quite work. To answer your question, I am looking for an output dataset with one row per person_id, a column for person_id, and a column for the first date on which all claims had zero payment and on which all subsequent claims had zero payment.


data have;
  input person_id claim_date :mmddyy10. claim_id payment;
  format claim_date mmddyy10.;
  cards;
1 01/01/2019 100 50
1 01/03/2019 101 40
1 02/09/2019 102 0
1 12/20/2019 103 0
2 10/20/2019 201 50
2 10/21/2019 202 40
2 11/30/2019 203 30
3 04/02/2019 301 20
3 04/05/2019 302 20
3 05/20/2019 304 0
3 12/30/2019 305 0
3 12/31/2019 306 50
;

proc sql;
/*  create table want as*/
    select l.person_id, min(l.claim_date) as claim_date format=mmddyy10.
    from have l
    where 
      payment=0
      and not exists
        ( select * 
          from have i
          where 
            i.person_id=l.person_id 
            and i.payment>0
            and i.claim_date > l.claim_date
        )
    group by 1
  ;
quit;
mkeintz
PROC Star

You have to go through each person_id twice - once to review only the non-zero payments, keeping the last date of this subgroup (last_non_zero).  The second time reads ALL the records, setting DUMMY=1 only for the case in which current payment=0 and the single record lag of date matches last_non_zero:

 

data want (drop=_:);
  set have (where=(payment^=0) in=non_zero_payment)
      have (in=second_pass);
  by person_id;
  if non_zero_payment=1 then _last_non_zero=claim_date;
  else if first.person_id then _last_non_zero=.;
  retain _last_non_zero;
  if second_pass;
  dummy=ifn(payment=0 and lag(claim_date)=_last_non_zero,1,0);
run;

 

  1. The SET statement reads each person_id twice.  The first pass reads only non-zero payment and the second reads all records.
  2. The _last_non_zero date variable is updated with each non-zero payment.  Since _last_non_zero is a retained variable, then it is reset to missing whenever a person_id has ONLY zero payments.
  3. The "if second_pass" is a subsetting if.
  4. The DUMMY= expression tests for current payment=0 and lag of date matching the _last_non_zero date.

 

Edited addition: This program assume data are sorted by person_id/claim_date, but it purposely only used "BY PERSON_ID" in the by-statement.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 979 views
  • 1 like
  • 5 in conversation