BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Fae
Obsidian | Level 7 Fae
Obsidian | Level 7

I have two set of data: Sales and Refund.  I need to merge the refund data back to the original purchase by matching customer_id, product_id and transaction date in a FIFO basis.  

 

 

Is there any existing merging function/procedure on SAS that I can use or I have to build my own code? 

 

The solution i have in mind now is add an unique id and a "matched" indicator to both data set, then loop the two data sets as described below:

 

< Loop till all refund data is matched.>

1)  Left join the refund data to sale data base on customer_id, product_id, transaction date and "matched" indicator is false.

2)  Remove any duplicates

3)  Set indicator for matched data as true.

<End Loop>

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

The following could give you a start,but you need be familiar with Hash Table.

 

 

data Sales;
input Customer_id	Product_Id $	Transaction_Date : $20.	Sales_Amt : dollar32.  Unit ;
cards;
1	A1	10-Jan-18	$15	1
1	A1	11-Jan-18	$15	2
1	A2	11-Jan-18	$30	1
2	A2	10-Jan-18	$30	1
2	A2	15-Jan-18	$70	2
3	A3	10-Jan-18	$20	1
3	A3	11-Jan-18	$20	1
;

data Refund;
input Customer_id  Product_Id $ Transaction_Date : $20. Refund_Amt : dollar32.	refund_unit ;
cards;
1	A1	12-Jan-18	$15	1
1	A1	12-Jan-18	$15	1
2	A2	16-Jan-18	$35	1
3	A3	12-Jan-18	$20	1
3	A3	12-Jan-18	$20	1
;

data want;
 if _n_=1 then do;
  if 0 then set refund;
  declare hash h(dataset:'refund',multidata:'y');
  h.definekey('Customer_id','Product_Id');
  h.definedata('Refund_Amt','refund_unit');
  h.definedone();
 end;
set sales;
by 	Customer_id;
if first.Customer_id then cum=0;
cum+Sales_Amt;
call missing(Refund_Amt,refund_unit);
rc=h.find();
do while(rc=0);
 if Refund_Amt <= cum then do;h.removedup(); leave;end;
 call missing(Refund_Amt,refund_unit);
 rc=h.find_next();
end;

if not missing(Refund_Amt) then cum=cum-Refund_Amt;
drop rc cum;
run;

View solution in original post

6 REPLIES 6
Reeza
Super User

If you have SAS/OR you may be able to leverage PROC BOM but I suspect using a customized SQL or data step is just as good. I'm not sure there's a loop required here though.

Fae
Obsidian | Level 7 Fae
Obsidian | Level 7

Thanks for you advice.  

 

I am only using SAS eg so i doubt I have access to proc BOM, 

 

How can I do these with a customized SQL without loop?  A customer can have multiple purchase/refund for the same product so any regular join will probably generate duplicate record.

Reeza
Super User

Provide some sample data with the expected output if you want sample code.

 


@Fae wrote:

Thanks for you advice.  

 

I am only using SAS eg so i doubt I have access to proc BOM, 

 

How can I do these with a customized SQL without loop?  A customer can have multiple purchase/refund for the same product so any regular join will probably generate duplicate record.


 

Fae
Obsidian | Level 7 Fae
Obsidian | Level 7

Thanks very much.

 

Sales:

Customer_idProduct_IdTransaction_DateSales_AmtUnit
1A110-Jan-18$151
1A111-Jan-18$152
1A211-Jan-18$301
2A210-Jan-18$301
2A215-Jan-18$702
3A310-Jan-18$201
3A311-Jan-18$201

 

Refund:

Customer_idProduct_IdTransaction_DateRefund_AmtUnit
1A112-Jan-18$151
1A112-Jan-18$151
2A216-Jan-18$351
3A312-Jan-18$201
3A312-Jan-18$201

 

Merged:

Customer_idProduct_IdTransaction_DateSales_AmtUnitRefund_Amtrefund_unit
1A110-Jan-18$151$151
1A111-Jan-18$152$151
1A211-Jan-18$301..
2A210-Jan-18$301..
2A215-Jan-18$702$351
3A310-Jan-18$201$201
3A311-Jan-18$202$201
Ksharp
Super User

The following could give you a start,but you need be familiar with Hash Table.

 

 

data Sales;
input Customer_id	Product_Id $	Transaction_Date : $20.	Sales_Amt : dollar32.  Unit ;
cards;
1	A1	10-Jan-18	$15	1
1	A1	11-Jan-18	$15	2
1	A2	11-Jan-18	$30	1
2	A2	10-Jan-18	$30	1
2	A2	15-Jan-18	$70	2
3	A3	10-Jan-18	$20	1
3	A3	11-Jan-18	$20	1
;

data Refund;
input Customer_id  Product_Id $ Transaction_Date : $20. Refund_Amt : dollar32.	refund_unit ;
cards;
1	A1	12-Jan-18	$15	1
1	A1	12-Jan-18	$15	1
2	A2	16-Jan-18	$35	1
3	A3	12-Jan-18	$20	1
3	A3	12-Jan-18	$20	1
;

data want;
 if _n_=1 then do;
  if 0 then set refund;
  declare hash h(dataset:'refund',multidata:'y');
  h.definekey('Customer_id','Product_Id');
  h.definedata('Refund_Amt','refund_unit');
  h.definedone();
 end;
set sales;
by 	Customer_id;
if first.Customer_id then cum=0;
cum+Sales_Amt;
call missing(Refund_Amt,refund_unit);
rc=h.find();
do while(rc=0);
 if Refund_Amt <= cum then do;h.removedup(); leave;end;
 call missing(Refund_Amt,refund_unit);
 rc=h.find_next();
end;

if not missing(Refund_Amt) then cum=cum-Refund_Amt;
drop rc cum;
run;
Fae
Obsidian | Level 7 Fae
Obsidian | Level 7

Thanks very much for your help.

SAS INNOVATE 2024

Innovate_SAS_Blue.png

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Get the $99 certification deal.jpg

 

 

Back in the Classroom!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 689 views
  • 2 likes
  • 3 in conversation