Solved: Re: Delete pair of observations for negative claims

dac_js · Posted 11-17-2023 08:58 AM

I have the data in following format:
data have;
input id date :date9. Supply drug $;
datalines;
1 03JAN19 90 AB
2 02JAN19 -90 AB
2 15FEB2019 30 AB
3 02JAN19 30 AB
3 05JAN19 -30 AB
3 30MAR19 30 AB
4 03APR19 60 AB
5 01JAN19 60 CD
5 02FEB19 60 CD
5 06FEB19 -60 CD
6 01MAR19 30 CD
6 01APR19 30 CD
7 01JAN19 60 CD
7 02FEB19 60 CD
7 06FEB19 -60 XY
8 01MAR19 30 CD
8 01APR19 30 CD
;

I need help with deleting negative supply of same drug in this data (the first and second observation for ID 1, the first and second observation for ID 3, second and third observation of ID 5 in this example, but not for second and third observation of ID 7), considering the second observations are actually a refund or reversal of the earlier observation (reversal within 7days of a visit).

Thanks in advance for the help.

mkeintz · Posted 11-19-2023 12:33 AM

This version of the code includes a date range check.

data have;
input id date :date9. Supply drug $;
format date date9.;
datalines;
1 03JAN19 90 AB
1 02JAN19 -90 AB
2 15FEB2019 30 AB
3 02JAN19 30 AB
3 05JAN19 -30 AB
3 30MAR19 30 AB
4 03APR19 60 AB
5 01JAN19 60 CD
5 02FEB19 60 CD
5 06FEB19 -60 CD
6 01MAR19 30 CD
6 01APR19 30 CD
7 01JAN19 60 CD
7 02FEB19 60 CD
7 06FEB19 -60 XY
8 01MAR19 30 CD
8 01APR19 30 CD
;

data want (drop=_:);
  set have (keep=id drug);
  by id drug  notsorted;
  merge have 
        have (firstobs=2 keep=supply date rename=(date=_nxt_date supply=_nxt_supply));

  retain _delete_count 0 ;
  if last.drug=0 and supply>0 and -1*supply=_nxt_supply and _nxt_date<=date+7 then _delete_count=2;

  if _delete_count=0 then output;
  else _delete_count=_delete_count-1;
run;

This program assumes data are grouped by ID and DRUG, and are sorted by ascending date within each ID/DRUG group. I take it that obs 2, had the wrong date, but I left it there.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

mkeintz · Posted 11-17-2023 07:01 PM

@dac_js wrote:

I have the data in following format:
data have;
input id date :date9. Supply drug $;
datalines;
1 03JAN19 90 AB
2 02JAN19 -90 AB
2 15FEB2019 30 AB
3 02JAN19 30 AB
3 05JAN19 -30 AB
3 30MAR19 30 AB
4 03APR19 60 AB
5 01JAN19 60 CD
5 02FEB19 60 CD
5 06FEB19 -60 CD
6 01MAR19 30 CD
6 01APR19 30 CD
7 01JAN19 60 CD
7 02FEB19 60 CD
7 06FEB19 -60 XY
8 01MAR19 30 CD
8 01APR19 30 CD
;

I need help with deleting negative supply of same drug in this data (the first and second observation for ID 1, the first and second observation for ID 3, second and third observation of ID 5 in this example, but not for second and third observation of ID 7), considering the second observations are actually a refund or reversal of the earlier observation (reversal within 7days of a visit).

Thanks in advance for the help.

But there is no "second observation for ID 1" in your sample data. I presume the second obs in your sample should have ID=1 (not ID=2).

Then:

data have;
input id date :date9. Supply drug $;
format date date9.;
datalines;
1 03JAN19 90 AB
1 02JAN19 -90 AB
2 15FEB2019 30 AB
3 02JAN19 30 AB
3 05JAN19 -30 AB
3 30MAR19 30 AB
4 03APR19 60 AB
5 01JAN19 60 CD
5 02FEB19 60 CD
5 06FEB19 -60 CD
6 01MAR19 30 CD
6 01APR19 30 CD
7 01JAN19 60 CD
7 02FEB19 60 CD
7 06FEB19 -60 XY
8 01MAR19 30 CD
8 01APR19 30 CD
;

data want (drop=_:);
  set have (keep=id drug);
  by id drug  notsorted;
  merge have have (firstobs=2 keep=supply rename=(supply=_nxt_supply));

  retain _matches_prior_obs 0 ;
  /*If _matches prior obs or matches next obs, then do two things*/
  if _matches_prior_obs=1 or (-1*supply=_nxt_supply and last.drug=0) then do;
    delete;
    _matches_prior_obs=ifn(_matches_prior_obs=1,0,1);
  end;

run;

The program assumes that the matching pairs are always adjacent observations.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

dac_js · Posted 11-18-2023 10:53 AM

Thank you for your suggestion. I tried it, but it didn't work as expected. I still have the negative values. Could you possibly offer any further guidance or an alternative solution? Thanks again for your help.

Patrick · Posted 11-18-2023 08:08 PM

@dac_js I suggest to first try and create sample data that's fully representative for your real data.

I assume the first two rows you've posted..

...should actually look like:

Could there also be additional scenarios in your real data? Like a single correction for two wrong entries?

...or is it possible that you've got a single summary correction for two wrong entries?

Are corrections always after the fact or is it possible in your real data that a correction has been entered by mistake (a minus value) that then gets fixed with a positive value at a later date?

....and so on...

In order to provide code logic that will work for all your cases you need to provide sample data that reflects all the possible cases in your real data and of course you also need to tell us how to treat these cases.

Below some sample code that creates data for additional scenarios. Please amend this data as required so it reflects what you're really dealing with.

data have;
  input id date :date9. Supply drug $;
  format date date9.;
datalines;
1 02JAN19 90 AB
1 03JAN19 -90 AB
2 15FEB2019 30 AB
3 02JAN19 30 AB
3 05JAN19 -30 AB
3 30MAR19 30 AB
4 03APR19 60 AB
5 01JAN19 60 CD
5 02FEB19 60 CD
5 06FEB19 -60 CD
6 01MAR19 30 CD
6 01APR19 30 CD
7 01JAN19 60 CD
7 02FEB19 60 CD
7 06FEB19 -60 XY
8 01MAR19 30 CD
8 01APR19 30 CD
09 01apr19 -10 aa
09 02apr19 20  aa
10 01apr19 10  bb
10 02apr19 20  bb
10 03apr19 -10 bb
11 01apr19 10  cc
11 02apr19 10  cc
11 03apr19 -20 cc
12 02jan19 -90 dd
12 03jan19 90  dd
;

dac_js · Posted 11-19-2023 11:38 AM

Thank you very much for your response! You are right about the first two rows of sample data

The cases I provided is more common but it is possible to have cases like this:

11 01apr19 10  cc
11 02apr19 10  cc
11 03apr19 -20 cc

The following type we do not need to deal with:

12 02jan19 -90 dd
12 03jan19 90  dd

Patrick · Posted 11-19-2023 11:28 PM

Test if below does it for you.

data have;
  input id date :date9. Supply drug $;
  format date date9.;
datalines;
1 01JAN19 30 AB
1 02JAN19 90 AB
1 03JAN19 -100 AB
1 04JAN19 -10 AB
2 15FEB2019 30 AB
3 02JAN19 30 AB
3 05JAN19 40 AB
3 30MAR19 30 AB
4 03APR19 60 AB
5 01JAN19 60 CD
5 02FEB19 60 CD
5 06FEB19 -60 CD
6 01MAR19 30 CD
6 01APR19 30 CD
7 01JAN19 60 CD
7 02FEB19 60 CD
7 06FEB19 -60 XY
8 01MAR19 30 CD
8 01APR19 30 CD
10 01apr19 10  bb
10 02apr19 20  bb
10 03apr19 -10 bb
11 01apr19 10  cc
11 02apr19 10  cc
11 03apr19 -20 cc
;

proc sort data=have out=inter;
  by id drug descending date descending Supply;
run;

data want(drop=_:);
  set inter;
  by id drug;
  retain _correction;
  if first.drug then _correction=0;

  if supply<0 then _correction=sum(_correction,supply);
  else
    do;
      _Supply_adjusted=sum(supply,_correction);
      _correction      =min(sum(_correction,supply),0);
      if _Supply_adjusted>0 then 
        do;
          supply=_Supply_adjusted;
          output;
        end;
    end;
run;

proc print data=want;
run;

mkeintz · Posted 11-19-2023 12:33 AM

This version of the code includes a date range check.

data have;
input id date :date9. Supply drug $;
format date date9.;
datalines;
1 03JAN19 90 AB
1 02JAN19 -90 AB
2 15FEB2019 30 AB
3 02JAN19 30 AB
3 05JAN19 -30 AB
3 30MAR19 30 AB
4 03APR19 60 AB
5 01JAN19 60 CD
5 02FEB19 60 CD
5 06FEB19 -60 CD
6 01MAR19 30 CD
6 01APR19 30 CD
7 01JAN19 60 CD
7 02FEB19 60 CD
7 06FEB19 -60 XY
8 01MAR19 30 CD
8 01APR19 30 CD
;

data want (drop=_:);
  set have (keep=id drug);
  by id drug  notsorted;
  merge have 
        have (firstobs=2 keep=supply date rename=(date=_nxt_date supply=_nxt_supply));

  retain _delete_count 0 ;
  if last.drug=0 and supply>0 and -1*supply=_nxt_supply and _nxt_date<=date+7 then _delete_count=2;

  if _delete_count=0 then output;
  else _delete_count=_delete_count-1;
run;

This program assumes data are grouped by ID and DRUG, and are sorted by ascending date within each ID/DRUG group. I take it that obs 2, had the wrong date, but I left it there.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

dac_js · Posted 11-20-2023 12:09 AM

Thank you! Your code is working.

SAS Innovate 2025: Save the Date

Health and Life Sciences Learning