delete corresponding rows based on conditions

Accepted Solution Solved
Reply
Contributor
Posts: 32
Accepted Solution

delete corresponding rows based on conditions

if sales credit=2, then it means returns.

 

The corresponding purchase of any return  can only be identified by same  customer id and same amount.

 

there could be a case of same customer id and same amount more than two times but only one of them is returned.

 

and also there will be some returns whose corresponding purchase could not be found.

 

I want to delete all the returns and corresponding purchases that we could find in SAS.

 

pls go through the sample data which has all the cases I explained above.

 

 

 

 

 

Customer idTransaction_DateTransaction_NoAmountSales_Credit
111113/4/2015 9:0082026805798191
111113/3/2015 13:3883253158975251
111113/27/2015 10:5385258224657251
111115/31/2015 13:3489514285454301
111115/15/2015 13:3390455526236301
111115/15/2015 13:3384043145133302
111112/18/2015 17:1183988568064341
111114/3/2015 13:3182641680012431
111114/3/2015 13:3185600050163432
111113/6/2015 10:1386308331399521
111114/14/2015 13:0790049122320561
111122/9/2015 10:0286378498192171
111122/25/2015 12:4788814535375201
111122/3/2015 13:4987670932129211
111125/25/2015 17:1990702769745251
111125/25/2015 17:1988903692494252
111127/1/2015 13:2488759266978261
111127/1/2015 13:2483500066497262
111126/27/2015 13:1282893592554411
111126/27/2015 13:1288499717353412
111123/30/2015 10:4790952745232432
111121/30/2015 10:2484597833130511
111125/6/2015 12:2690835511287541
111125/6/2015 12:2682806550212542
111125/17/2015 9:1786962330269561
111126/5/2015 17:3285343530406601
111134/20/2015 17:1681882769839121
111135/19/2015 11:0186033694521131
111135/19/2015 11:0187255703509132
111134/22/2015 10:1789466597573141
111134/22/2015 10:1781895233664142
111136/23/2015 11:1790540834184181
111136/23/2015 11:1785855224536182
111136/22/2015 15:5689027902590221
111134/26/2015 16:0183235628559341
111132/19/2015 11:4083464237568411
111132/19/2015 11:4090022634753412
111131/8/2015 17:5582173824456481
111132/6/2015 16:1285957557997551
111135/7/2015 15:5986307619247571
111135/7/2015 15:5982407926355572

Accepted Solutions
Solution
‎01-08-2016 09:26 AM
Respected Advisor
Posts: 4,992

Re: delete corresponding rows based on conditions

Here's one approach:

 

proc sort data=have;

by customer_id amount;

run;

 

data want;

n_purchases=0;

n_amounts=0;

do until (last.amount);

   set have;

   by customer_id amount;

   if sales_credit=1 then n_purchases + 1;

   else n_returns + 1;

end;

matches = min(n_purchases, n_returns);

n_purchases=0;

n_amounts=0;

do until (last.amount);

   set have;

   by customer_id amount;

   if sales_credit=1 then do;

      n_purchases + 1;

      if n_purchases > matches then output;

   end;

   else do;

      n_returns + 1;

      if n_returns > matches then output;

   end;

end;

drop matches n_purchases n_returns;

run;

 

The top loop examines all records for the same customer/amount, and counts how many pairs to expect.  The bottom loop uses that information to examine the same records, and output the proper observations.

 

The code is untested, but should be OK.  Note that one issue is not mentioned in your description, nor is it addressed by the program.  What should happen if a return for $20 is dated before an amount for $20?  Can those be paired?  Requiring the return to fall after the purchase woud add to the complexity of the program.

 

Good luck.

 

View solution in original post


All Replies
Super User
Posts: 1,116

Re: delete corresponding rows based on conditions

Could you please try the below code , want2 dataset is your expected output

 

proc sort data=have;
by Customer_id 	Amount	descending Sales_Credit;
run;

data want1 want2;
set have;
by Customer_id 	Amount	descending Sales_Credit;
retain return;
if first.amount then return=Sales_Credit;
if first.amount and last.amount then single=1;
if return=2 and single eq . then output want1;
else output want2;
run;
Thanks,
Jag
Contributor
Posts: 32

Re: delete corresponding rows based on conditions

Its working well but, It is deleting all purchases with same customer id and same amount. rather it should select only row  of purchase for one row of return.

for example :  in the below sample data, it should delete only row of custid=11111 and amount =30 and sales_credit =1 as there is only one row of return is there. 

 

but in the solution presented both rows of 11111,30,1 are deleted(seperated in to other dataset).

Customer idTransaction_DateTransaction_NoAmountSales_Credit
111113/4/2015 9:0082026805798191
111113/3/2015 13:3883253158975251
111113/27/2015 10:5385258224657251
111115/31/2015 13:3489514285454301
111115/15/2015 13:3390455526236301
111115/15/2015 13:3384043145133302
Super User
Posts: 1,116

Re: delete corresponding rows based on conditions

I updated the code, please try

 

data want1 want2;
set have;
by Customer_id 	Amount	descending Sales_Credit;
retain return;
if first.amount then return=Sales_Credit;
if first.amount and last.amount then single=1;
if return=2 and Sales_Credit=1 and single eq . then output want1;
else output want2;
run;
Thanks,
Jag
Contributor
Posts: 32

Re: delete corresponding rows based on conditions

The added Sales_credit=1, in the code is not showing any improvement and also avoiding all sales_credit=2 in entering into want1.  

 

we need to avoid the repeated same(customer_id, amount, sales credit =1) in entering want1. 

 

and also I have noticed peculiar situations like below. In this type of situations, we as we have two returns, we need to remove only 2 purchases.

 

Customer idTransaction_DateTransaction_NoAmountSales_Credit
111115/31/2015 13:3489514285454302
111115/15/2015 13:3390455526236302
111115/15/2015 13:3384043145133301
111112/18/2015 17:1183988568064301
111114/3/2015 13:3182641680012301

 

 

I appreciate your efforts. I am trying with proc sql "set" operations, think in that terms also if you can.  

 

we just need the answer, method is not important

 

Super User
Posts: 1,116

Re: delete corresponding rows based on conditions

May be if you could mention the expected output, i could try one more time
Thanks,
Jag
Solution
‎01-08-2016 09:26 AM
Respected Advisor
Posts: 4,992

Re: delete corresponding rows based on conditions

Here's one approach:

 

proc sort data=have;

by customer_id amount;

run;

 

data want;

n_purchases=0;

n_amounts=0;

do until (last.amount);

   set have;

   by customer_id amount;

   if sales_credit=1 then n_purchases + 1;

   else n_returns + 1;

end;

matches = min(n_purchases, n_returns);

n_purchases=0;

n_amounts=0;

do until (last.amount);

   set have;

   by customer_id amount;

   if sales_credit=1 then do;

      n_purchases + 1;

      if n_purchases > matches then output;

   end;

   else do;

      n_returns + 1;

      if n_returns > matches then output;

   end;

end;

drop matches n_purchases n_returns;

run;

 

The top loop examines all records for the same customer/amount, and counts how many pairs to expect.  The bottom loop uses that information to examine the same records, and output the proper observations.

 

The code is untested, but should be OK.  Note that one issue is not mentioned in your description, nor is it addressed by the program.  What should happen if a return for $20 is dated before an amount for $20?  Can those be paired?  Requiring the return to fall after the purchase woud add to the complexity of the program.

 

Good luck.

 

Contributor
Posts: 32

Re: delete corresponding rows based on conditions

Yes, Astounding.

 

You are right. We need check with the date also. 

 

Your solution is working fine till here. Could you add the date part also it. 

 

I appreciate your solution, It simple and strong.

 

Could you check into the date part also?

 

Thanks 

 

Grand Advisor
Posts: 9,584

Re: delete corresponding rows based on conditions

You didn't post the output you need yet ? 

Assuming I understand what you need.

 

data have;
infile cards expandtabs truncover;
input (Customerid	Transaction_Date	Transaction_No) (: $40.)	Amount 	Sales_Credit ;
cards;
11111	3/4/20159:00	82026805798	19	1
11111	3/3/201513:38	83253158975	25	1
11111	3/27/201510:53	85258224657	25	1
11111	5/31/201513:34	89514285454	30	1
11111	5/15/201513:33	90455526236	30	1
11111	5/15/201513:33	84043145133	30	2
11111	2/18/201517:11	83988568064	34	1
11111	4/3/201513:31	82641680012	43	1
11111	4/3/201513:31	85600050163	43	2
11111	3/6/201510:13	86308331399	52	1
11111	4/14/201513:07	90049122320	56	1
11112	2/9/201510:02	86378498192	17	1
11112	2/25/201512:47	88814535375	20	1
11112	2/3/201513:49	87670932129	21	1
11112	5/25/201517:19	90702769745	25	1
11112	5/25/201517:19	88903692494	25	2
11112	7/1/201513:24	88759266978	26	1
11112	7/1/201513:24	83500066497	26	2
11112	6/27/201513:12	82893592554	41	1
11112	6/27/201513:12	88499717353	41	2
11112	3/30/201510:47	90952745232	43	2
11112	1/30/201510:24	84597833130	51	1
11112	5/6/201512:26	90835511287	54	1
11112	5/6/201512:26	82806550212	54	2
11112	5/17/20159:17	86962330269	56	1
11112	6/5/201517:32	85343530406	60	1
11113	4/20/201517:16	81882769839	12	1
11113	5/19/201511:01	86033694521	13	1
11113	5/19/201511:01	87255703509	13	2
11113	4/22/201510:17	89466597573	14	1
11113	4/22/201510:17	81895233664	14	2
11113	6/23/201511:17	90540834184	18	1
11113	6/23/201511:17	85855224536	18	2
11113	6/22/201515:56	89027902590	22	1
11113	4/26/201516:01	83235628559	34	1
11113	2/19/201511:40	83464237568	41	1
11113	2/19/201511:40	90022634753	41	2
11113	1/8/201517:55	82173824456	48	1
11113	2/6/201516:12	85957557997	55	1
11113	5/7/201515:59	86307619247	57	1
11113	5/7/201515:59	82407926355	57	2
;
run;
data temp;
if _n_ eq 1 then do;
 if 0 then set have;
 declare hash h(dataset:'have(where=(Sales_Credit=2))');
 h.definekey('Customerid','Amount');
 h.definedone();
end;
 set have;
 if h.check()=0 and Sales_Credit ne 2 then delete;
run;
proc sort data=temp out=want nodupkey;
 by Customerid	Amount 	Sales_Credit ;
run;
Contributor
Posts: 32

Re: delete corresponding rows based on conditions

Thanks, Ksharp,

 

There is a small problem with the code. It is removing all the rows with matching customer_id and amount if there is a return. instead it should remove only once.

 

 

Respected Advisor
Posts: 3,124

Re: delete corresponding rows based on conditions

[ Edited ]

And this one is trying to pair sales-returns first, then delete them.

data have;
	infile cards expandtabs truncover;
	input (Customerid	Transaction_Date	Transaction_No) (: $40.)	Amount 	Sales_Credit;
	cards;
11111	3/4/20159:00	82026805798	19	1
11111	3/3/201513:38	83253158975	25	1
11111	3/27/201510:53	85258224657	25	1
11111	5/31/201513:34	89514285454	30	1
11111	5/15/201513:33	90455526236	30	1
11111	5/15/201513:33	84043145133	30	2
11111	2/18/201517:11	83988568064	34	1
11111	4/3/201513:31	82641680012	43	1
11111	4/3/201513:31	85600050163	43	2
11111	4/3/201513:31	85600050164	43	2
11111	3/6/201510:13	86308331399	52	1
11111	4/14/201513:07	90049122320	56	1
11112	2/9/201510:02	86378498192	17	1
11112	2/25/201512:47	88814535375	20	1
11112	2/3/201513:49	87670932129	21	1
11112	5/25/201517:19	90702769745	25	1
11112	5/25/201517:19	88903692494	25	2
11112	7/1/201513:24	88759266978	26	1
11112	7/1/201513:24	83500066497	26	2
11112	6/27/201513:12	82893592554	41	1
11112	6/27/201513:12	88499717353	41	2
11112	3/30/201510:47	90952745232	43	2
11112	1/30/201510:24	84597833130	51	1
11112	5/6/201512:26	90835511287	54	1
11112	5/6/201512:26	82806550212	54	2
11112	5/17/20159:17	86962330269	56	1
11112	6/5/201517:32	85343530406	60	1
11113	4/20/201517:16	81882769839	12	1
11113	5/19/201511:01	86033694521	13	1
11113	5/19/201511:01	87255703509	13	2
11113	4/22/201510:17	89466597573	14	1
11113	4/22/201510:17	81895233664	14	2
11113	6/23/201511:17	90540834184	18	1
11113	6/23/201511:17	85855224536	18	2
11113	6/22/201515:56	89027902590	22	1
11113	4/26/201516:01	83235628559	34	1
11113	2/19/201511:40	83464237568	41	1
11113	2/19/201511:40	90022634753	41	2
11113	1/8/201517:55	82173824456	48	1
11113	2/6/201516:12	85957557997	55	1
11113	5/7/201515:59	86307619247	57	1
11113	5/7/201515:59	82407926355	57	2
;
run;



data sale return;
	set have;

	if Sales_Credit=1 then
		output sale;
	else output return;
run;

proc sort data=sale;
	by Customerid Amount;
run;

proc sort data=return;
	by Customerid Amount;
run;

data want;
	if 0 then
		set sale return;
	call missing (of _all_);
	merge sale return(rename=sales_credit=_return);
	by Customerid Amount;

	if sales_credit=1 and _return=2 then
		delete;
	sales_credit=coalesce(sales_credit,_return);
	output;
	drop _return;
run;
Contributor
Posts: 32

Re: delete corresponding rows based on conditions

 

Thanks, Haikuo,

 

There is a small problem with the code. It is removing all the rows with matching customer_id and amount if there is a return. Instead, it should remove only once or twice as the number of returns.

Grand Advisor
Posts: 10,223

Re: delete corresponding rows based on conditions

Time to post what you expect the output to look like.

Contributor
Posts: 32

Re: delete corresponding rows based on conditions

Hi Ballardw,

 

let me explain the problem statement once again.

 

you have a set of transactions.

 

sales_credit whether it is a purchase or return. (sales_credit=1 ==> purchase, sales_credit=2 ==> return).

 

The below table defines the problem statement. please look in the above posts for part of the dataset.

 

1Transactions table has no order number but only transaction number so we can only identify if there is a return by matching customer id and amount. 
  you may make an assumption that if there are 2 sales one one return by customer with same amount then consider the earliest bought item as returned  as there is no common code to identify same item
  sales and retruns data is available from Jan to July 
  There is a possibility of returns for some items with out sales , that means sales happened in before January, for which we do not have data in this table
   
2Task

We need only those transaction which bought and not returned

 

 

Respected Advisor
Posts: 3,124

Re: delete corresponding rows based on conditions

[ Edited ]

You are right, the old code overlooked the variable change in 'return', I have advanced the variable 'rename' in the upstream Proc Sort to save some typing. Now it should be good.

 

data have;
	infile cards expandtabs truncover;
	input (Customerid	Transaction_Date	Transaction_No) (: $40.)	Amount 	Sales_Credit;
	cards;
11111	3/4/20159:00	82026805798	19	1
11111	3/3/201513:38	83253158975	25	1
11111	3/27/201510:53	85258224657	25	1
11111	5/31/201513:34	89514285454	30	1
11111	5/15/201513:33	90455526236	30	1
11111	5/15/201513:33	84043145133	30	2
11111	2/18/201517:11	83988568064	34	1
11111	4/3/201513:31	82641680012	43	1
11111	4/3/201513:31	85600050163	43	2
11111	4/3/201513:31	85600050164	43	2
11111	3/6/201510:13	86308331399	52	1
11111	4/14/201513:07	90049122320	56	1
11112	2/9/201510:02	86378498192	17	1
11112	2/25/201512:47	88814535375	20	1
11112	2/3/201513:49	87670932129	21	1
11112	5/25/201517:19	90702769745	25	1
11112	5/25/201517:19	88903692494	25	2
11112	7/1/201513:24	88759266978	26	1
11112	7/1/201513:24	83500066497	26	2
11112	6/27/201513:12	82893592554	41	1
11112	6/27/201513:12	88499717353	41	2
11112	3/30/201510:47	90952745232	43	2
11112	1/30/201510:24	84597833130	51	1
11112	5/6/201512:26	90835511287	54	1
11112	5/6/201512:26	82806550212	54	2
11112	5/17/20159:17	86962330269	56	1
11112	6/5/201517:32	85343530406	60	1
11113	4/20/201517:16	81882769839	12	1
11113	5/19/201511:01	86033694521	13	1
11113	5/19/201511:01	87255703509	13	2
11113	4/22/201510:17	89466597573	14	1
11113	4/22/201510:17	81895233664	14	2
11113	6/23/201511:17	90540834184	18	1
11113	6/23/201511:17	85855224536	18	2
11113	6/22/201515:56	89027902590	22	1
11113	4/26/201516:01	83235628559	34	1
11113	2/19/201511:40	83464237568	41	1
11113	2/19/201511:40	90022634753	41	2
11113	1/8/201517:55	82173824456	48	1
11113	2/6/201516:12	85957557997	55	1
11113	5/7/201515:59	86307619247	57	1
11113	5/7/201515:59	82407926355	57	2
;
run;



data sale return;
	set have;

	if Sales_Credit=1 then
		output sale;
	else output return;
run;

proc sort data=sale;
	by Customerid Amount;
run;

proc sort data=return(rename=sales_credit=_return);
	by Customerid Amount;
run;

data want;
if 0 then set sale return;
	call missing (of _all_);
	merge sale return;
	by Customerid Amount;
	if sales_credit=1 and _return=2 then
		delete;
	sales_credit=coalesce(sales_credit,_return);
	output;
	drop _return;
	
run;
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 19 replies
  • 738 views
  • 3 likes
  • 6 in conversation