BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Fae
Obsidian | Level 7 Fae
Obsidian | Level 7

I have two set of data: Sales and Refund.  I need to merge the refund data back to the original purchase by matching customer_id, product_id and transaction date in a FIFO basis.  

 

 

Is there any existing merging function/procedure on SAS that I can use or I have to build my own code? 

 

The solution i have in mind now is add an unique id and a "matched" indicator to both data set, then loop the two data sets as described below:

 

< Loop till all refund data is matched.>

1)  Left join the refund data to sale data base on customer_id, product_id, transaction date and "matched" indicator is false.

2)  Remove any duplicates

3)  Set indicator for matched data as true.

<End Loop>

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

The following could give you a start,but you need be familiar with Hash Table.

 

 

data Sales;
input Customer_id	Product_Id $	Transaction_Date : $20.	Sales_Amt : dollar32.  Unit ;
cards;
1	A1	10-Jan-18	$15	1
1	A1	11-Jan-18	$15	2
1	A2	11-Jan-18	$30	1
2	A2	10-Jan-18	$30	1
2	A2	15-Jan-18	$70	2
3	A3	10-Jan-18	$20	1
3	A3	11-Jan-18	$20	1
;

data Refund;
input Customer_id  Product_Id $ Transaction_Date : $20. Refund_Amt : dollar32.	refund_unit ;
cards;
1	A1	12-Jan-18	$15	1
1	A1	12-Jan-18	$15	1
2	A2	16-Jan-18	$35	1
3	A3	12-Jan-18	$20	1
3	A3	12-Jan-18	$20	1
;

data want;
 if _n_=1 then do;
  if 0 then set refund;
  declare hash h(dataset:'refund',multidata:'y');
  h.definekey('Customer_id','Product_Id');
  h.definedata('Refund_Amt','refund_unit');
  h.definedone();
 end;
set sales;
by 	Customer_id;
if first.Customer_id then cum=0;
cum+Sales_Amt;
call missing(Refund_Amt,refund_unit);
rc=h.find();
do while(rc=0);
 if Refund_Amt <= cum then do;h.removedup(); leave;end;
 call missing(Refund_Amt,refund_unit);
 rc=h.find_next();
end;

if not missing(Refund_Amt) then cum=cum-Refund_Amt;
drop rc cum;
run;

View solution in original post

6 REPLIES 6
Reeza
Super User

If you have SAS/OR you may be able to leverage PROC BOM but I suspect using a customized SQL or data step is just as good. I'm not sure there's a loop required here though.

Fae
Obsidian | Level 7 Fae
Obsidian | Level 7

Thanks for you advice.  

 

I am only using SAS eg so i doubt I have access to proc BOM, 

 

How can I do these with a customized SQL without loop?  A customer can have multiple purchase/refund for the same product so any regular join will probably generate duplicate record.

Reeza
Super User

Provide some sample data with the expected output if you want sample code.

 


@Fae wrote:

Thanks for you advice.  

 

I am only using SAS eg so i doubt I have access to proc BOM, 

 

How can I do these with a customized SQL without loop?  A customer can have multiple purchase/refund for the same product so any regular join will probably generate duplicate record.


 

Fae
Obsidian | Level 7 Fae
Obsidian | Level 7

Thanks very much.

 

Sales:

Customer_idProduct_IdTransaction_DateSales_AmtUnit
1A110-Jan-18$151
1A111-Jan-18$152
1A211-Jan-18$301
2A210-Jan-18$301
2A215-Jan-18$702
3A310-Jan-18$201
3A311-Jan-18$201

 

Refund:

Customer_idProduct_IdTransaction_DateRefund_AmtUnit
1A112-Jan-18$151
1A112-Jan-18$151
2A216-Jan-18$351
3A312-Jan-18$201
3A312-Jan-18$201

 

Merged:

Customer_idProduct_IdTransaction_DateSales_AmtUnitRefund_Amtrefund_unit
1A110-Jan-18$151$151
1A111-Jan-18$152$151
1A211-Jan-18$301..
2A210-Jan-18$301..
2A215-Jan-18$702$351
3A310-Jan-18$201$201
3A311-Jan-18$202$201
Ksharp
Super User

The following could give you a start,but you need be familiar with Hash Table.

 

 

data Sales;
input Customer_id	Product_Id $	Transaction_Date : $20.	Sales_Amt : dollar32.  Unit ;
cards;
1	A1	10-Jan-18	$15	1
1	A1	11-Jan-18	$15	2
1	A2	11-Jan-18	$30	1
2	A2	10-Jan-18	$30	1
2	A2	15-Jan-18	$70	2
3	A3	10-Jan-18	$20	1
3	A3	11-Jan-18	$20	1
;

data Refund;
input Customer_id  Product_Id $ Transaction_Date : $20. Refund_Amt : dollar32.	refund_unit ;
cards;
1	A1	12-Jan-18	$15	1
1	A1	12-Jan-18	$15	1
2	A2	16-Jan-18	$35	1
3	A3	12-Jan-18	$20	1
3	A3	12-Jan-18	$20	1
;

data want;
 if _n_=1 then do;
  if 0 then set refund;
  declare hash h(dataset:'refund',multidata:'y');
  h.definekey('Customer_id','Product_Id');
  h.definedata('Refund_Amt','refund_unit');
  h.definedone();
 end;
set sales;
by 	Customer_id;
if first.Customer_id then cum=0;
cum+Sales_Amt;
call missing(Refund_Amt,refund_unit);
rc=h.find();
do while(rc=0);
 if Refund_Amt <= cum then do;h.removedup(); leave;end;
 call missing(Refund_Amt,refund_unit);
 rc=h.find_next();
end;

if not missing(Refund_Amt) then cum=cum-Refund_Amt;
drop rc cum;
run;
Fae
Obsidian | Level 7 Fae
Obsidian | Level 7

Thanks very much for your help.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1306 views
  • 2 likes
  • 3 in conversation