I have two set of data: Sales and Refund. I need to merge the refund data back to the original purchase by matching customer_id, product_id and transaction date in a FIFO basis.
Is there any existing merging function/procedure on SAS that I can use or I have to build my own code?
The solution i have in mind now is add an unique id and a "matched" indicator to both data set, then loop the two data sets as described below:
< Loop till all refund data is matched.>
1) Left join the refund data to sale data base on customer_id, product_id, transaction date and "matched" indicator is false.
2) Remove any duplicates
3) Set indicator for matched data as true.
<End Loop>
The following could give you a start,but you need be familiar with Hash Table.
data Sales;
input Customer_id Product_Id $ Transaction_Date : $20. Sales_Amt : dollar32. Unit ;
cards;
1 A1 10-Jan-18 $15 1
1 A1 11-Jan-18 $15 2
1 A2 11-Jan-18 $30 1
2 A2 10-Jan-18 $30 1
2 A2 15-Jan-18 $70 2
3 A3 10-Jan-18 $20 1
3 A3 11-Jan-18 $20 1
;
data Refund;
input Customer_id Product_Id $ Transaction_Date : $20. Refund_Amt : dollar32. refund_unit ;
cards;
1 A1 12-Jan-18 $15 1
1 A1 12-Jan-18 $15 1
2 A2 16-Jan-18 $35 1
3 A3 12-Jan-18 $20 1
3 A3 12-Jan-18 $20 1
;
data want;
if _n_=1 then do;
if 0 then set refund;
declare hash h(dataset:'refund',multidata:'y');
h.definekey('Customer_id','Product_Id');
h.definedata('Refund_Amt','refund_unit');
h.definedone();
end;
set sales;
by Customer_id;
if first.Customer_id then cum=0;
cum+Sales_Amt;
call missing(Refund_Amt,refund_unit);
rc=h.find();
do while(rc=0);
if Refund_Amt <= cum then do;h.removedup(); leave;end;
call missing(Refund_Amt,refund_unit);
rc=h.find_next();
end;
if not missing(Refund_Amt) then cum=cum-Refund_Amt;
drop rc cum;
run;
If you have SAS/OR you may be able to leverage PROC BOM but I suspect using a customized SQL or data step is just as good. I'm not sure there's a loop required here though.
Thanks for you advice.
I am only using SAS eg so i doubt I have access to proc BOM,
How can I do these with a customized SQL without loop? A customer can have multiple purchase/refund for the same product so any regular join will probably generate duplicate record.
Provide some sample data with the expected output if you want sample code.
@Fae wrote:
Thanks for you advice.
I am only using SAS eg so i doubt I have access to proc BOM,
How can I do these with a customized SQL without loop? A customer can have multiple purchase/refund for the same product so any regular join will probably generate duplicate record.
Thanks very much.
Sales:
Customer_id | Product_Id | Transaction_Date | Sales_Amt | Unit |
1 | A1 | 10-Jan-18 | $15 | 1 |
1 | A1 | 11-Jan-18 | $15 | 2 |
1 | A2 | 11-Jan-18 | $30 | 1 |
2 | A2 | 10-Jan-18 | $30 | 1 |
2 | A2 | 15-Jan-18 | $70 | 2 |
3 | A3 | 10-Jan-18 | $20 | 1 |
3 | A3 | 11-Jan-18 | $20 | 1 |
Refund:
Customer_id | Product_Id | Transaction_Date | Refund_Amt | Unit |
1 | A1 | 12-Jan-18 | $15 | 1 |
1 | A1 | 12-Jan-18 | $15 | 1 |
2 | A2 | 16-Jan-18 | $35 | 1 |
3 | A3 | 12-Jan-18 | $20 | 1 |
3 | A3 | 12-Jan-18 | $20 | 1 |
Merged:
Customer_id | Product_Id | Transaction_Date | Sales_Amt | Unit | Refund_Amt | refund_unit |
1 | A1 | 10-Jan-18 | $15 | 1 | $15 | 1 |
1 | A1 | 11-Jan-18 | $15 | 2 | $15 | 1 |
1 | A2 | 11-Jan-18 | $30 | 1 | . | . |
2 | A2 | 10-Jan-18 | $30 | 1 | . | . |
2 | A2 | 15-Jan-18 | $70 | 2 | $35 | 1 |
3 | A3 | 10-Jan-18 | $20 | 1 | $20 | 1 |
3 | A3 | 11-Jan-18 | $20 | 2 | $20 | 1 |
The following could give you a start,but you need be familiar with Hash Table.
data Sales;
input Customer_id Product_Id $ Transaction_Date : $20. Sales_Amt : dollar32. Unit ;
cards;
1 A1 10-Jan-18 $15 1
1 A1 11-Jan-18 $15 2
1 A2 11-Jan-18 $30 1
2 A2 10-Jan-18 $30 1
2 A2 15-Jan-18 $70 2
3 A3 10-Jan-18 $20 1
3 A3 11-Jan-18 $20 1
;
data Refund;
input Customer_id Product_Id $ Transaction_Date : $20. Refund_Amt : dollar32. refund_unit ;
cards;
1 A1 12-Jan-18 $15 1
1 A1 12-Jan-18 $15 1
2 A2 16-Jan-18 $35 1
3 A3 12-Jan-18 $20 1
3 A3 12-Jan-18 $20 1
;
data want;
if _n_=1 then do;
if 0 then set refund;
declare hash h(dataset:'refund',multidata:'y');
h.definekey('Customer_id','Product_Id');
h.definedata('Refund_Amt','refund_unit');
h.definedone();
end;
set sales;
by Customer_id;
if first.Customer_id then cum=0;
cum+Sales_Amt;
call missing(Refund_Amt,refund_unit);
rc=h.find();
do while(rc=0);
if Refund_Amt <= cum then do;h.removedup(); leave;end;
call missing(Refund_Amt,refund_unit);
rc=h.find_next();
end;
if not missing(Refund_Amt) then cum=cum-Refund_Amt;
drop rc cum;
run;
Thanks very much for your help.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.