HI,
For days now I've been trying to clean up my data set.
Example:
OBS Number amount count
1 1 45 1
2 1 -45 2
3 1 45 3
4 2 30 1
5 2 25 2
6 2 -30 3
7 2 25 4
...
I was thinking of doing a do loop to start from first.number to last.number to exclude the observations that sum up to zero (obs 2, 3). Sometimes the negative amount is a sum of two previous/following numbers. I also tried deduping by squaring the amount and concatenating it with the number, doing count. However it doesn't work in all cases (obs 5,7 would be wrongly excluded too). So my required data set once cleaned would be:
OBS Number amount
1 1 45
5 2 25
7 2 25
Please help!
Thanks in advance!
Alex
There are lots of logic you need to consider about .
data have;
input Number amount ;
cards;
1 45
1 -45
1 45
2 30
2 25
2 -30
2 25
;
run;
data temp1 temp2;
set have;
if amount lt 0 then do;
amount=abs(amount);
output temp2;
end;
else output temp1;
run;
proc sort data=temp1;by Number amount;run;
proc sort data=temp2;by Number amount;run;
data want;
ina=0;inb=0;
merge temp1(in=ina) temp2(in=inb);
by Number amount;
if not ina then amount=-amount;
if not ina or not inb;
run;
Xia Keshan
Hi Xia Keshan,
Thanks for your help.
I do like your approach, however it doesn't work in all cases (I probably should have given different/extra examples). On few occasions I have a partial refund (in this case I'd like the refund to be deleted, the amount of the refund taken away from the closest positive amount within the number), refund which is a sum of previous payments and sum of two refunds is an amount. Oh and where I say refund I mean negative amount. Easier to explain on the below example. Is there a way of amending your code so it considers all below cases? (thanks again!!!)
I don't need to keep obs, it is here only for explanation purposes.
OBS Number amount count
1 1 45 1
2 1 -45 2
3 1 45 3
4 2 30 1
5 2 25 2
6 2 -30 3
7 2 25 4
8 3 50 1
9 3 60 2
10 3 -17 3
11 4 100 1
12 4 100 2
13 4 100 3
14 4 -200 4
15 5 80 1
16 5 40 2
17 5 -20 3
18 5 -20 4
And I'd like the results to be:
OBS Number amount
1 1 45
5 2 25
7 2 25
8 3 50 1
9 3 43 2 /************************************ partial refund 60-17=43************************/
11 4 100 1 /****************************** obs 12,13 cancel out obs 14***************************/
15 5 80 1 /******************************obs 16 cancel out obs 17 and 18***************************/
;
Thanks for you help!
data have;
input Number amount count ;
cards;
1 45 1
1 -45 2
1 45 3
2 30 1
2 25 2
2 -30 3
2 25 4
3 50 1
3 60 2
3 -17 3
4 100 1
4 100 2
4 100 3
4 -200 4
5 80 1
5 40 2
5 -20 3
5 -20 4
;
run;
data temp1 temp2;
set have;
if amount lt 0 then do;
amount=abs(amount);
output temp2;
end;
else output temp1;
run;
proc sort data=temp1;by Number amount;run;
proc sort data=temp2;by Number amount;run;
data x;
ina=0;inb=0;
merge temp1(in=ina) temp2(in=inb);
by Number amount;
if not ina then amount=-amount;
if not ina or not inb;
run;
proc sort data=x;by Number descending count;run;
data want;
set x;
by number;
retain found;
if first.number then do;sum=0;found=0;end;
sum+amount;
if not found and sum gt 0 then do;amount=sum;found=1; end;
if sum gt 0;
drop sum found;
run;
proc sort data=want;by number count;run;
Hi Xia Keshan,
Thanks for your help! Nearly there Although this code works for majority of cases, it still doesn't work for all of them. Especially when there are other transactions within 'number' following the refund. I was testing my data on your code (below). Also, sometimes when it does work and the 'sum' should be zero, there appears to be observation with 1.421085E-14. Is that due to formatting?
Also for number 7:
7 840.42 1
7 14.54 2
7 753.65 3
7 31.6 4
7 -768.19 5
Is there a way of it checking (within a number) for sum = 0, so I'd want
7 840.42 1
7 31.6 4
left (as 14.54+753.65-768.19=0). But if it's not going to work for every single case, then don't worry.
Thanks again!
data have;
input Number amount count ;
cards;
1 45 1
1 -45 2
1 45 3
2 30 1
2 25 2
2 -30 3
2 25 4
3 50 1
3 60 2
3 -17 3
4 100 1
4 100 2
4 100 3
4 -200 4
5 80 1
5 35.9 2
5 188.66 3
5 -224.56 4
5 20 5
6 57.57 1
6 28.38 2
6 57.57 3
6 -85.95 4
6 28.38 5
7 840.42 1
7 14.54 2
7 753.65 3
7 31.6 4
7 -768.19 5
8 116.39 1
8 82.09 2
8 116.39 3
8 -198.48 4
8 82.09 5
8 116.39 6
8 -198.48 7
;
run;
Why it is
14.54+753.65-768.19=0
NOT
31.6+753.65-768.19=0
?
Because it doesn't sum up to 0.
31.6+753.65-768.19=17.06 (not 0)
There must have been another payment before the refund
So what is your output ?
Output can't be ?
7 840.42 1
7 14.54 2
7 17.06 3
Do you want pick up the most likely value for that one ?
NO, You can't do that, unless you can enumerate all the combination of positive value .
what I mean is, where there is a negative value (refund) I want that obs to cancel out with others. In case of obs 7
7 840.42 1
7 14.54 2
7 753.65 3
7 31.6 4
7 -768.19 5
I'd want to have this left:
7 840.42 1
7 31.6 4
But that's not as important. The code you gave me previously works perfect, apart from when there are other obs withing that number. Any ideas?
Idea is enumerate all the combination of position value to see if the sum of combination could refund the negative.
But That is too complicated .
First , check the one value to see if it could refund negative value.
Second, check the all the combination of two to see if its sum could refund
Third, check the all the combination of three to see if its sum could refund
...............
Do you really want this ?
And if all these combination could not refund to zero , what are you going to do ? Negative value minus WHAT ?
If it wouldn't refund to zero then there'd be partial refund. That's why I though to do a loop within a number.
OBS Number amount count
1 1 95 1
2 1 -45 2
3 1 45 3
You want this , right ? You don't need to consider the order ?
OBS Number amount count
1 1 95 1
You want this ? You really want this ??
data have;
input Number amount count ;
cards;
1 45 1
1 -45 2
1 45 3
2 30 1
2 25 2
2 -30 3
2 25 4
3 50 1
3 60 2
3 -17 3
4 100 1
4 100 2
4 100 3
4 -200 4
5 80 1
5 35.9 2
5 188.66 3
5 -224.56 4
5 20 5
6 57.57 1
6 28.38 2
6 57.57 3
6 -85.95 4
6 28.38 5
7 840.42 1
7 14.54 2
7 753.65 3
7 31.6 4
7 -768.19 5
8 116.39 1
8 82.09 2
8 116.39 3
8 -198.48 4
8 82.09 5
8 116.39 6
8 -198.48 7
;
run;
proc sort data=have;by Number descending amount;run;
proc sql;
select max(n) into : n
from (select count(*) as n from have where amount ge 0 group by Number );
quit;
data temp;
if _n_ eq 1 then do;
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('k');
ha.definedata('Number','amount','k');
ha.definedone();
end;
set have;
by Number;
array v{&n};
array value{&n} _temporary_;
array key{&n} _temporary_;
if first.Number then do;
n=0;
ha.clear();
call missing(of value{*},of key{*});
end;
if amount ge 0 then do;
n+1;k=n;
ha.add();
value{n}=amount;
key{n}=n;
end;
else do;
_k=-1;
do i=1 to 2**dim(v);
rc=graycode(_k, of v{*});
sum=0;
do j=1 to dim(v);
sum+v{j}*value{j};
end;
dif=round(sum+amount,0.01);
if Number=8 then put sum= amount= dif= v{*} value{1} value{2} value{3} value{4};
if dif=0 then do;
do k=1 to dim(v);
if v{k}=1 then ha.remove(key:key{k});
end;
found=1;
leave;
end;
end;
if not found then output;
n=0;
do while(hi.next()=0);
n+1;
value{n}=amount;
key{n}=k;
end;
end;
if last.Number then do;
do while(hi.next()=0);
output;
end;
end;
keep Number amount;
run;
data want;
set temp;
by number;
retain found;
if first.number then do;sum=0;found=0;end;
sum+amount;
if not found and sum gt 0 then do;amount=sum;found=1; end;
if sum gt 0;
drop sum found;
run;
Xia Keshan
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.