BookmarkSubscribeRSS Feed
macedonataan
Fluorite | Level 6

Hi guys, let's suppose the following scenario.

 

Clientid       Purchase_Date         item_name               amount      

1                     01/01/2020                book                        $20                   

1                     01/01/2020                book                        $-20

2                     03/01/2020                pen                          $2

3                     04/01/2020                game                       $70

2                     03/01/2020                pen                          $-2

1                     06/01/2020                bike                         $200

 

Is possible to see that there is some negative values under the [amount] column, these refers to items returned to the store, and it's recorded as a negative purchase.

 

I would like to know how could I remove transactions (both positive and negative) for  items that returned to the store?

 

Notes: The orders returned have the same [Clientid], [Purchase_Date], [item_name], and [amount]               

7 REPLIES 7
japelin
Rhodochrosite | Level 12

how about this 

data have;
input Clientid Purchase_Date:mmddyy10. item_name $ amount:dollar4.;
format Purchase_Date mmddyy10.
       amount dollar4.;
datalines;
1 01/01/2020 book $20
1 01/01/2020 book $-20
2 03/01/2020 pen  $2
3 04/01/2020 game $70
2 03/01/2020 pen  $-2
1 06/01/2020 bike $200
;
run;

proc sql;
  create table want as
  select a.* from have as a
  left join have as b
  on a.Clientid=b.Clientid and 
     a.Purchase_Date=b.Purchase_Date and 
     a.item_name=b.item_name and
     a.amount=b.amount*(-1)
  where b.amount=.;
quit;
macedonataan
Fluorite | Level 6
It works perfectly for the didactic example, however when applied to real data (+10Mi rows) it seems to match more observations than it should....
KachiM
Rhodochrosite | Level 12

A Data Step approach. Sort the data set. Remove the pair of observations that starts with negative AMOUNT.

data have;
input Clientid Purchase_Date:mmddyy10. item_name $ amount:dollar4.;
format Purchase_Date mmddyy10.
       amount dollar4.;
datalines;
1 01/01/2020 book $20
1 01/01/2020 book $-20
2 03/01/2020 pen  $2
3 04/01/2020 game $70
2 03/01/2020 pen  $-2
1 06/01/2020 bike $200
;
run;

proc sort data = have out = have_sorted;
	by Clientid Purchase_Date item_name amount;
run;

data want;
	do until(last.Clientid);
		set have_sorted;
		by Clientid;
		if amount < 0 then do; prev = 1; delete; end;
		else if amount >= 0 and prev = 1 then do; delete; prev = .; end;
	end;
drop prev;
run;
macedonataan
Fluorite | Level 6
For some reason it is keeping the observation for the pen
mkeintz
PROC Star

Put all the returns (amount<0) in a hash object.   Read in all the purchases (amount>0), keeping only those for which there is no offsetting return.  This can be done without data sorting.

 

data have;
input Clientid Purchase_Date:mmddyy10. item_name $ amount:dollar4.;
format Purchase_Date mmddyy10.
       amount dollar4.;
datalines;
1 01/01/2020 book $20
1 01/01/2020 book $-20
2 03/01/2020 pen  $2
3 04/01/2020 game $70
2 03/01/2020 pen  $-2
1 06/01/2020 bike $200
;
run;
data want;
  set have (where=(amount>0));
  if _n_=1 then do;
    declare hash returns (dataset:'have (where=(amount<0))');
      returns.definekey(all:'Y');
      returns.definedata(all:'Y');
      returns.definedone();
  end;
  if returns.check(key:clientid,key:purchase_date,key:item_name,key:-1*amount)^=0;
run;

This program assumes that no item is purchased more than once by a given client on a given date.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
macedonataan
Fluorite | Level 6

Is there a way of doing it assuming the possibility of items being purchased more then once by the same client?

mkeintz
PROC Star

@macedonataan wrote:

Is there a way of doing it assuming the possibility of items being purchased more then once by the same client?


"more than once by the same client", is ok but "more than once ON THE SAME DAY" would require revisions

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 942 views
  • 3 likes
  • 4 in conversation