BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Fae
Obsidian | Level 7 Fae
Obsidian | Level 7

I have two set of data: Sales and Refund.  I need to merge the refund data back to the original purchase by matching customer_id, product_id and transaction date in a FIFO basis.  

 

 

Is there any existing merging function/procedure on SAS that I can use or I have to build my own code? 

 

The solution i have in mind now is add an unique id and a "matched" indicator to both data set, then loop the two data sets as described below:

 

< Loop till all refund data is matched.>

1)  Left join the refund data to sale data base on customer_id, product_id, transaction date and "matched" indicator is false.

2)  Remove any duplicates

3)  Set indicator for matched data as true.

<End Loop>

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

The following could give you a start,but you need be familiar with Hash Table.

 

 

data Sales;
input Customer_id	Product_Id $	Transaction_Date : $20.	Sales_Amt : dollar32.  Unit ;
cards;
1	A1	10-Jan-18	$15	1
1	A1	11-Jan-18	$15	2
1	A2	11-Jan-18	$30	1
2	A2	10-Jan-18	$30	1
2	A2	15-Jan-18	$70	2
3	A3	10-Jan-18	$20	1
3	A3	11-Jan-18	$20	1
;

data Refund;
input Customer_id  Product_Id $ Transaction_Date : $20. Refund_Amt : dollar32.	refund_unit ;
cards;
1	A1	12-Jan-18	$15	1
1	A1	12-Jan-18	$15	1
2	A2	16-Jan-18	$35	1
3	A3	12-Jan-18	$20	1
3	A3	12-Jan-18	$20	1
;

data want;
 if _n_=1 then do;
  if 0 then set refund;
  declare hash h(dataset:'refund',multidata:'y');
  h.definekey('Customer_id','Product_Id');
  h.definedata('Refund_Amt','refund_unit');
  h.definedone();
 end;
set sales;
by 	Customer_id;
if first.Customer_id then cum=0;
cum+Sales_Amt;
call missing(Refund_Amt,refund_unit);
rc=h.find();
do while(rc=0);
 if Refund_Amt <= cum then do;h.removedup(); leave;end;
 call missing(Refund_Amt,refund_unit);
 rc=h.find_next();
end;

if not missing(Refund_Amt) then cum=cum-Refund_Amt;
drop rc cum;
run;

View solution in original post

6 REPLIES 6
Reeza
Super User

If you have SAS/OR you may be able to leverage PROC BOM but I suspect using a customized SQL or data step is just as good. I'm not sure there's a loop required here though.

Fae
Obsidian | Level 7 Fae
Obsidian | Level 7

Thanks for you advice.  

 

I am only using SAS eg so i doubt I have access to proc BOM, 

 

How can I do these with a customized SQL without loop?  A customer can have multiple purchase/refund for the same product so any regular join will probably generate duplicate record.

Reeza
Super User

Provide some sample data with the expected output if you want sample code.

 


@Fae wrote:

Thanks for you advice.  

 

I am only using SAS eg so i doubt I have access to proc BOM, 

 

How can I do these with a customized SQL without loop?  A customer can have multiple purchase/refund for the same product so any regular join will probably generate duplicate record.


 

Fae
Obsidian | Level 7 Fae
Obsidian | Level 7

Thanks very much.

 

Sales:

Customer_idProduct_IdTransaction_DateSales_AmtUnit
1A110-Jan-18$151
1A111-Jan-18$152
1A211-Jan-18$301
2A210-Jan-18$301
2A215-Jan-18$702
3A310-Jan-18$201
3A311-Jan-18$201

 

Refund:

Customer_idProduct_IdTransaction_DateRefund_AmtUnit
1A112-Jan-18$151
1A112-Jan-18$151
2A216-Jan-18$351
3A312-Jan-18$201
3A312-Jan-18$201

 

Merged:

Customer_idProduct_IdTransaction_DateSales_AmtUnitRefund_Amtrefund_unit
1A110-Jan-18$151$151
1A111-Jan-18$152$151
1A211-Jan-18$301..
2A210-Jan-18$301..
2A215-Jan-18$702$351
3A310-Jan-18$201$201
3A311-Jan-18$202$201
Ksharp
Super User

The following could give you a start,but you need be familiar with Hash Table.

 

 

data Sales;
input Customer_id	Product_Id $	Transaction_Date : $20.	Sales_Amt : dollar32.  Unit ;
cards;
1	A1	10-Jan-18	$15	1
1	A1	11-Jan-18	$15	2
1	A2	11-Jan-18	$30	1
2	A2	10-Jan-18	$30	1
2	A2	15-Jan-18	$70	2
3	A3	10-Jan-18	$20	1
3	A3	11-Jan-18	$20	1
;

data Refund;
input Customer_id  Product_Id $ Transaction_Date : $20. Refund_Amt : dollar32.	refund_unit ;
cards;
1	A1	12-Jan-18	$15	1
1	A1	12-Jan-18	$15	1
2	A2	16-Jan-18	$35	1
3	A3	12-Jan-18	$20	1
3	A3	12-Jan-18	$20	1
;

data want;
 if _n_=1 then do;
  if 0 then set refund;
  declare hash h(dataset:'refund',multidata:'y');
  h.definekey('Customer_id','Product_Id');
  h.definedata('Refund_Amt','refund_unit');
  h.definedone();
 end;
set sales;
by 	Customer_id;
if first.Customer_id then cum=0;
cum+Sales_Amt;
call missing(Refund_Amt,refund_unit);
rc=h.find();
do while(rc=0);
 if Refund_Amt <= cum then do;h.removedup(); leave;end;
 call missing(Refund_Amt,refund_unit);
 rc=h.find_next();
end;

if not missing(Refund_Amt) then cum=cum-Refund_Amt;
drop rc cum;
run;
Fae
Obsidian | Level 7 Fae
Obsidian | Level 7

Thanks very much for your help.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 861 views
  • 2 likes
  • 3 in conversation