I have the data in following format:
data have;
input id date :date9. Supply drug $;
datalines;
1 03JAN19 90 AB
2 02JAN19 -90 AB
2 15FEB2019 30 AB
3 02JAN19 30 AB
3 05JAN19 -30 AB
3 30MAR19 30 AB
4 03APR19 60 AB
5 01JAN19 60 CD
5 02FEB19 60 CD
5 06FEB19 -60 CD
6 01MAR19 30 CD
6 01APR19 30 CD
7 01JAN19 60 CD
7 02FEB19 60 CD
7 06FEB19 -60 XY
8 01MAR19 30 CD
8 01APR19 30 CD
;
I need help with deleting negative supply of same drug in this data (the first and second observation for ID 1, the first and second observation for ID 3, second and third observation of ID 5 in this example, but not for second and third observation of ID 7), considering the second observations are actually a refund or reversal of the earlier observation (reversal within 7days of a visit).
Thanks in advance for the help.
This version of the code includes a date range check.
data have;
input id date :date9. Supply drug $;
format date date9.;
datalines;
1 03JAN19 90 AB
1 02JAN19 -90 AB
2 15FEB2019 30 AB
3 02JAN19 30 AB
3 05JAN19 -30 AB
3 30MAR19 30 AB
4 03APR19 60 AB
5 01JAN19 60 CD
5 02FEB19 60 CD
5 06FEB19 -60 CD
6 01MAR19 30 CD
6 01APR19 30 CD
7 01JAN19 60 CD
7 02FEB19 60 CD
7 06FEB19 -60 XY
8 01MAR19 30 CD
8 01APR19 30 CD
;
data want (drop=_:);
set have (keep=id drug);
by id drug notsorted;
merge have
have (firstobs=2 keep=supply date rename=(date=_nxt_date supply=_nxt_supply));
retain _delete_count 0 ;
if last.drug=0 and supply>0 and -1*supply=_nxt_supply and _nxt_date<=date+7 then _delete_count=2;
if _delete_count=0 then output;
else _delete_count=_delete_count-1;
run;
This program assumes data are grouped by ID and DRUG, and are sorted by ascending date within each ID/DRUG group. I take it that obs 2, had the wrong date, but I left it there.
@dac_js wrote:
I have the data in following format:
data have;
input id date :date9. Supply drug $;
datalines;
1 03JAN19 90 AB
2 02JAN19 -90 AB
2 15FEB2019 30 AB
3 02JAN19 30 AB
3 05JAN19 -30 AB
3 30MAR19 30 AB
4 03APR19 60 AB
5 01JAN19 60 CD
5 02FEB19 60 CD
5 06FEB19 -60 CD
6 01MAR19 30 CD
6 01APR19 30 CD
7 01JAN19 60 CD
7 02FEB19 60 CD
7 06FEB19 -60 XY
8 01MAR19 30 CD
8 01APR19 30 CD
;I need help with deleting negative supply of same drug in this data (the first and second observation for ID 1, the first and second observation for ID 3, second and third observation of ID 5 in this example, but not for second and third observation of ID 7), considering the second observations are actually a refund or reversal of the earlier observation (reversal within 7days of a visit).
Thanks in advance for the help.
But there is no "second observation for ID 1" in your sample data. I presume the second obs in your sample should have ID=1 (not ID=2).
Then:
data have;
input id date :date9. Supply drug $;
format date date9.;
datalines;
1 03JAN19 90 AB
1 02JAN19 -90 AB
2 15FEB2019 30 AB
3 02JAN19 30 AB
3 05JAN19 -30 AB
3 30MAR19 30 AB
4 03APR19 60 AB
5 01JAN19 60 CD
5 02FEB19 60 CD
5 06FEB19 -60 CD
6 01MAR19 30 CD
6 01APR19 30 CD
7 01JAN19 60 CD
7 02FEB19 60 CD
7 06FEB19 -60 XY
8 01MAR19 30 CD
8 01APR19 30 CD
;
data want (drop=_:);
set have (keep=id drug);
by id drug notsorted;
merge have have (firstobs=2 keep=supply rename=(supply=_nxt_supply));
retain _matches_prior_obs 0 ;
/*If _matches prior obs or matches next obs, then do two things*/
if _matches_prior_obs=1 or (-1*supply=_nxt_supply and last.drug=0) then do;
delete;
_matches_prior_obs=ifn(_matches_prior_obs=1,0,1);
end;
run;
The program assumes that the matching pairs are always adjacent observations.
@dac_js I suggest to first try and create sample data that's fully representative for your real data.
I assume the first two rows you've posted..
...should actually look like:
Could there also be additional scenarios in your real data? Like a single correction for two wrong entries?
...or is it possible that you've got a single summary correction for two wrong entries?
Are corrections always after the fact or is it possible in your real data that a correction has been entered by mistake (a minus value) that then gets fixed with a positive value at a later date?
....and so on...
In order to provide code logic that will work for all your cases you need to provide sample data that reflects all the possible cases in your real data and of course you also need to tell us how to treat these cases.
Below some sample code that creates data for additional scenarios. Please amend this data as required so it reflects what you're really dealing with.
data have;
input id date :date9. Supply drug $;
format date date9.;
datalines;
1 02JAN19 90 AB
1 03JAN19 -90 AB
2 15FEB2019 30 AB
3 02JAN19 30 AB
3 05JAN19 -30 AB
3 30MAR19 30 AB
4 03APR19 60 AB
5 01JAN19 60 CD
5 02FEB19 60 CD
5 06FEB19 -60 CD
6 01MAR19 30 CD
6 01APR19 30 CD
7 01JAN19 60 CD
7 02FEB19 60 CD
7 06FEB19 -60 XY
8 01MAR19 30 CD
8 01APR19 30 CD
09 01apr19 -10 aa
09 02apr19 20 aa
10 01apr19 10 bb
10 02apr19 20 bb
10 03apr19 -10 bb
11 01apr19 10 cc
11 02apr19 10 cc
11 03apr19 -20 cc
12 02jan19 -90 dd
12 03jan19 90 dd
;
Thank you very much for your response! You are right about the first two rows of sample data
The cases I provided is more common but it is possible to have cases like this:
11 01apr19 10 cc
11 02apr19 10 cc
11 03apr19 -20 cc
The following type we do not need to deal with:
12 02jan19 -90 dd
12 03jan19 90 dd
Test if below does it for you.
data have;
input id date :date9. Supply drug $;
format date date9.;
datalines;
1 01JAN19 30 AB
1 02JAN19 90 AB
1 03JAN19 -100 AB
1 04JAN19 -10 AB
2 15FEB2019 30 AB
3 02JAN19 30 AB
3 05JAN19 40 AB
3 30MAR19 30 AB
4 03APR19 60 AB
5 01JAN19 60 CD
5 02FEB19 60 CD
5 06FEB19 -60 CD
6 01MAR19 30 CD
6 01APR19 30 CD
7 01JAN19 60 CD
7 02FEB19 60 CD
7 06FEB19 -60 XY
8 01MAR19 30 CD
8 01APR19 30 CD
10 01apr19 10 bb
10 02apr19 20 bb
10 03apr19 -10 bb
11 01apr19 10 cc
11 02apr19 10 cc
11 03apr19 -20 cc
;
proc sort data=have out=inter;
by id drug descending date descending Supply;
run;
data want(drop=_:);
set inter;
by id drug;
retain _correction;
if first.drug then _correction=0;
if supply<0 then _correction=sum(_correction,supply);
else
do;
_Supply_adjusted=sum(supply,_correction);
_correction =min(sum(_correction,supply),0);
if _Supply_adjusted>0 then
do;
supply=_Supply_adjusted;
output;
end;
end;
run;
proc print data=want;
run;
This version of the code includes a date range check.
data have;
input id date :date9. Supply drug $;
format date date9.;
datalines;
1 03JAN19 90 AB
1 02JAN19 -90 AB
2 15FEB2019 30 AB
3 02JAN19 30 AB
3 05JAN19 -30 AB
3 30MAR19 30 AB
4 03APR19 60 AB
5 01JAN19 60 CD
5 02FEB19 60 CD
5 06FEB19 -60 CD
6 01MAR19 30 CD
6 01APR19 30 CD
7 01JAN19 60 CD
7 02FEB19 60 CD
7 06FEB19 -60 XY
8 01MAR19 30 CD
8 01APR19 30 CD
;
data want (drop=_:);
set have (keep=id drug);
by id drug notsorted;
merge have
have (firstobs=2 keep=supply date rename=(date=_nxt_date supply=_nxt_supply));
retain _delete_count 0 ;
if last.drug=0 and supply>0 and -1*supply=_nxt_supply and _nxt_date<=date+7 then _delete_count=2;
if _delete_count=0 then output;
else _delete_count=_delete_count-1;
run;
This program assumes data are grouped by ID and DRUG, and are sorted by ascending date within each ID/DRUG group. I take it that obs 2, had the wrong date, but I left it there.
Thank you! Your code is working.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Need courses to help you with SAS Life Sciences Analytics Framework, SAS Health Cohort Builder, or other topics? Check out the Health and Life Sciences learning path for all of the offerings.