Matching clients with SAS

macedonataan · Posted 03-14-2022 11:33 PM

Hi guys, let's suppose the following scenario.

Clientid Purchase_Date item_name amount

1 01/01/2020 book $20

1 01/01/2020 book $-20

2 03/01/2020 pen $2

3 04/01/2020 game $70

2 03/01/2020 pen $-2

1 06/01/2020 bike $200

Is possible to see that there is some negative values under the [amount] column, these refers to items returned to the store, and it's recorded as a negative purchase.

I would like to know how could I remove transactions (both positive and negative) for items that returned to the store?

Notes: The orders returned have the same [Clientid], [Purchase_Date], [item_name], and [amount]

japelin · Posted 03-14-2022 11:48 PM

how about this

data have;
input Clientid Purchase_Date:mmddyy10. item_name $ amount:dollar4.;
format Purchase_Date mmddyy10.
       amount dollar4.;
datalines;
1 01/01/2020 book $20
1 01/01/2020 book $-20
2 03/01/2020 pen  $2
3 04/01/2020 game $70
2 03/01/2020 pen  $-2
1 06/01/2020 bike $200
;
run;

proc sql;
  create table want as
  select a.* from have as a
  left join have as b
  on a.Clientid=b.Clientid and 
     a.Purchase_Date=b.Purchase_Date and 
     a.item_name=b.item_name and
     a.amount=b.amount*(-1)
  where b.amount=.;
quit;

macedonataan · Posted 03-15-2022 04:37 PM

It works perfectly for the didactic example, however when applied to real data (+10Mi rows) it seems to match more observations than it should....

KachiM · Posted 03-15-2022 01:10 AM

A Data Step approach. Sort the data set. Remove the pair of observations that starts with negative AMOUNT.

data have;
input Clientid Purchase_Date:mmddyy10. item_name $ amount:dollar4.;
format Purchase_Date mmddyy10.
       amount dollar4.;
datalines;
1 01/01/2020 book $20
1 01/01/2020 book $-20
2 03/01/2020 pen  $2
3 04/01/2020 game $70
2 03/01/2020 pen  $-2
1 06/01/2020 bike $200
;
run;

proc sort data = have out = have_sorted;
	by Clientid Purchase_Date item_name amount;
run;

data want;
	do until(last.Clientid);
		set have_sorted;
		by Clientid;
		if amount < 0 then do; prev = 1; delete; end;
		else if amount >= 0 and prev = 1 then do; delete; prev = .; end;
	end;
drop prev;
run;

macedonataan · Posted 03-15-2022 04:16 PM

For some reason it is keeping the observation for the pen

mkeintz · Posted 03-15-2022 01:20 AM

Put all the returns (amount<0) in a hash object. Read in all the purchases (amount>0), keeping only those for which there is no offsetting return. This can be done without data sorting.

data have;
input Clientid Purchase_Date:mmddyy10. item_name $ amount:dollar4.;
format Purchase_Date mmddyy10.
       amount dollar4.;
datalines;
1 01/01/2020 book $20
1 01/01/2020 book $-20
2 03/01/2020 pen  $2
3 04/01/2020 game $70
2 03/01/2020 pen  $-2
1 06/01/2020 bike $200
;
run;
data want;
  set have (where=(amount>0));
  if _n_=1 then do;
    declare hash returns (dataset:'have (where=(amount<0))');
      returns.definekey(all:'Y');
      returns.definedata(all:'Y');
      returns.definedone();
  end;
  if returns.check(key:clientid,key:purchase_date,key:item_name,key:-1*amount)^=0;
run;

This program assumes that no item is purchased more than once by a given client on a given date.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

macedonataan · Posted 03-15-2022 02:44 AM

Is there a way of doing it assuming the possibility of items being purchased more then once by the same client?

mkeintz · Posted 03-15-2022 12:13 PM

@macedonataan wrote:

Is there a way of doing it assuming the possibility of items being purchased more then once by the same client?

"more than once by the same client", is ok but "more than once ON THE SAME DAY" would require revisions

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Matching clients with SAS

Re: Matching clients with SAS

Re: Matching clients with SAS

Re: Matching clients with SAS

Re: Matching clients with SAS

Re: Matching clients with SAS

Re: Matching clients with SAS

Re: Matching clients with SAS

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away