I have two set of data: Sales and Refund. I need to merge the refund data back to the original purchase by matching customer_id, product_id and transaction date in a FIFO basis.
Is there any existing merging function/procedure on SAS that I can use or I have to build my own code?
The solution i have in mind now is add an unique id and a "matched" indicator to both data set, then loop the two data sets as described below:
< Loop till all refund data is matched.>
1) Left join the refund data to sale data base on customer_id, product_id, transaction date and "matched" indicator is false.
2) Remove any duplicates
3) Set indicator for matched data as true.
<End Loop>
The following could give you a start,but you need be familiar with Hash Table.
data Sales;
input Customer_id Product_Id $ Transaction_Date : $20. Sales_Amt : dollar32. Unit ;
cards;
1 A1 10-Jan-18 $15 1
1 A1 11-Jan-18 $15 2
1 A2 11-Jan-18 $30 1
2 A2 10-Jan-18 $30 1
2 A2 15-Jan-18 $70 2
3 A3 10-Jan-18 $20 1
3 A3 11-Jan-18 $20 1
;
data Refund;
input Customer_id Product_Id $ Transaction_Date : $20. Refund_Amt : dollar32. refund_unit ;
cards;
1 A1 12-Jan-18 $15 1
1 A1 12-Jan-18 $15 1
2 A2 16-Jan-18 $35 1
3 A3 12-Jan-18 $20 1
3 A3 12-Jan-18 $20 1
;
data want;
if _n_=1 then do;
if 0 then set refund;
declare hash h(dataset:'refund',multidata:'y');
h.definekey('Customer_id','Product_Id');
h.definedata('Refund_Amt','refund_unit');
h.definedone();
end;
set sales;
by Customer_id;
if first.Customer_id then cum=0;
cum+Sales_Amt;
call missing(Refund_Amt,refund_unit);
rc=h.find();
do while(rc=0);
if Refund_Amt <= cum then do;h.removedup(); leave;end;
call missing(Refund_Amt,refund_unit);
rc=h.find_next();
end;
if not missing(Refund_Amt) then cum=cum-Refund_Amt;
drop rc cum;
run;
If you have SAS/OR you may be able to leverage PROC BOM but I suspect using a customized SQL or data step is just as good. I'm not sure there's a loop required here though.
Thanks for you advice.
I am only using SAS eg so i doubt I have access to proc BOM,
How can I do these with a customized SQL without loop? A customer can have multiple purchase/refund for the same product so any regular join will probably generate duplicate record.
Provide some sample data with the expected output if you want sample code.
@Fae wrote:
Thanks for you advice.
I am only using SAS eg so i doubt I have access to proc BOM,
How can I do these with a customized SQL without loop? A customer can have multiple purchase/refund for the same product so any regular join will probably generate duplicate record.
Thanks very much.
Sales:
Customer_id | Product_Id | Transaction_Date | Sales_Amt | Unit |
1 | A1 | 10-Jan-18 | $15 | 1 |
1 | A1 | 11-Jan-18 | $15 | 2 |
1 | A2 | 11-Jan-18 | $30 | 1 |
2 | A2 | 10-Jan-18 | $30 | 1 |
2 | A2 | 15-Jan-18 | $70 | 2 |
3 | A3 | 10-Jan-18 | $20 | 1 |
3 | A3 | 11-Jan-18 | $20 | 1 |
Refund:
Customer_id | Product_Id | Transaction_Date | Refund_Amt | Unit |
1 | A1 | 12-Jan-18 | $15 | 1 |
1 | A1 | 12-Jan-18 | $15 | 1 |
2 | A2 | 16-Jan-18 | $35 | 1 |
3 | A3 | 12-Jan-18 | $20 | 1 |
3 | A3 | 12-Jan-18 | $20 | 1 |
Merged:
Customer_id | Product_Id | Transaction_Date | Sales_Amt | Unit | Refund_Amt | refund_unit |
1 | A1 | 10-Jan-18 | $15 | 1 | $15 | 1 |
1 | A1 | 11-Jan-18 | $15 | 2 | $15 | 1 |
1 | A2 | 11-Jan-18 | $30 | 1 | . | . |
2 | A2 | 10-Jan-18 | $30 | 1 | . | . |
2 | A2 | 15-Jan-18 | $70 | 2 | $35 | 1 |
3 | A3 | 10-Jan-18 | $20 | 1 | $20 | 1 |
3 | A3 | 11-Jan-18 | $20 | 2 | $20 | 1 |
The following could give you a start,but you need be familiar with Hash Table.
data Sales;
input Customer_id Product_Id $ Transaction_Date : $20. Sales_Amt : dollar32. Unit ;
cards;
1 A1 10-Jan-18 $15 1
1 A1 11-Jan-18 $15 2
1 A2 11-Jan-18 $30 1
2 A2 10-Jan-18 $30 1
2 A2 15-Jan-18 $70 2
3 A3 10-Jan-18 $20 1
3 A3 11-Jan-18 $20 1
;
data Refund;
input Customer_id Product_Id $ Transaction_Date : $20. Refund_Amt : dollar32. refund_unit ;
cards;
1 A1 12-Jan-18 $15 1
1 A1 12-Jan-18 $15 1
2 A2 16-Jan-18 $35 1
3 A3 12-Jan-18 $20 1
3 A3 12-Jan-18 $20 1
;
data want;
if _n_=1 then do;
if 0 then set refund;
declare hash h(dataset:'refund',multidata:'y');
h.definekey('Customer_id','Product_Id');
h.definedata('Refund_Amt','refund_unit');
h.definedone();
end;
set sales;
by Customer_id;
if first.Customer_id then cum=0;
cum+Sales_Amt;
call missing(Refund_Amt,refund_unit);
rc=h.find();
do while(rc=0);
if Refund_Amt <= cum then do;h.removedup(); leave;end;
call missing(Refund_Amt,refund_unit);
rc=h.find_next();
end;
if not missing(Refund_Amt) then cum=cum-Refund_Amt;
drop rc cum;
run;
Thanks very much for your help.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.