Hi guys, let's suppose the following scenario.
Clientid Purchase_Date item_name amount
1 01/01/2020 book $20
1 01/01/2020 book $-20
2 03/01/2020 pen $2
3 04/01/2020 game $70
2 03/01/2020 pen $-2
1 06/01/2020 bike $200
Is possible to see that there is some negative values under the [amount] column, these refers to items returned to the store, and it's recorded as a negative purchase.
I would like to know how could I remove transactions (both positive and negative) for items that returned to the store?
Notes: The orders returned have the same [Clientid], [Purchase_Date], [item_name], and [amount]
how about this
data have;
input Clientid Purchase_Date:mmddyy10. item_name $ amount:dollar4.;
format Purchase_Date mmddyy10.
amount dollar4.;
datalines;
1 01/01/2020 book $20
1 01/01/2020 book $-20
2 03/01/2020 pen $2
3 04/01/2020 game $70
2 03/01/2020 pen $-2
1 06/01/2020 bike $200
;
run;
proc sql;
create table want as
select a.* from have as a
left join have as b
on a.Clientid=b.Clientid and
a.Purchase_Date=b.Purchase_Date and
a.item_name=b.item_name and
a.amount=b.amount*(-1)
where b.amount=.;
quit;
A Data Step approach. Sort the data set. Remove the pair of observations that starts with negative AMOUNT.
data have;
input Clientid Purchase_Date:mmddyy10. item_name $ amount:dollar4.;
format Purchase_Date mmddyy10.
amount dollar4.;
datalines;
1 01/01/2020 book $20
1 01/01/2020 book $-20
2 03/01/2020 pen $2
3 04/01/2020 game $70
2 03/01/2020 pen $-2
1 06/01/2020 bike $200
;
run;
proc sort data = have out = have_sorted;
by Clientid Purchase_Date item_name amount;
run;
data want;
do until(last.Clientid);
set have_sorted;
by Clientid;
if amount < 0 then do; prev = 1; delete; end;
else if amount >= 0 and prev = 1 then do; delete; prev = .; end;
end;
drop prev;
run;
Put all the returns (amount<0) in a hash object. Read in all the purchases (amount>0), keeping only those for which there is no offsetting return. This can be done without data sorting.
data have;
input Clientid Purchase_Date:mmddyy10. item_name $ amount:dollar4.;
format Purchase_Date mmddyy10.
amount dollar4.;
datalines;
1 01/01/2020 book $20
1 01/01/2020 book $-20
2 03/01/2020 pen $2
3 04/01/2020 game $70
2 03/01/2020 pen $-2
1 06/01/2020 bike $200
;
run;
data want;
set have (where=(amount>0));
if _n_=1 then do;
declare hash returns (dataset:'have (where=(amount<0))');
returns.definekey(all:'Y');
returns.definedata(all:'Y');
returns.definedone();
end;
if returns.check(key:clientid,key:purchase_date,key:item_name,key:-1*amount)^=0;
run;
This program assumes that no item is purchased more than once by a given client on a given date.
Is there a way of doing it assuming the possibility of items being purchased more then once by the same client?
@macedonataan wrote:
Is there a way of doing it assuming the possibility of items being purchased more then once by the same client?
"more than once by the same client", is ok but "more than once ON THE SAME DAY" would require revisions
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.