BookmarkSubscribeRSS Feed
Alex_mcc
Calcite | Level 5

HI,

For days now I've been trying to clean up my data set.

Example:

OBS      Number      amount         count

1           1                45                  1         

2           1                -45                 2

3           1                45                  3

4           2                 30                1

5           2                  25               2    

6           2               -30                 3

7           2                25                 4

...

I was thinking of doing a do loop to start from first.number to last.number to exclude the observations that sum up to zero (obs 2, 3). Sometimes the negative amount is a sum of two previous/following numbers. I also tried deduping by squaring the amount and concatenating it with the number, doing count. However it doesn't work in all cases (obs 5,7 would be wrongly excluded too). So my required data set once cleaned would be:

OBS      Number      amount        

1           1                45                           

5           2                  25                  

7           2                25               

Please help!

Thanks in advance!

Alex

22 REPLIES 22
Ksharp
Super User

There are lots of logic you need to consider about .

Code: Program

data have;
input Number   amount ;
cards;  
  1 45 
  1 -45 
  1 45 
  2 30 
  2 25 
  2 -30 
  2 25 
;
run;
data temp1 temp2;
set have;
if amount lt 0 then do;
amount=abs(amount);
output temp2;
end;
else output temp1;
run;
proc sort data=temp1;by Number amount;run;
proc sort data=temp2;by Number amount;run;
data want;
ina=0;inb=0;
merge temp1(in=ina) temp2(in=inb);
by Number amount;
if not ina then amount=-amount;
if not ina or not inb;
run;

Xia Keshan

Alex_mcc
Calcite | Level 5

Hi Xia Keshan,

Thanks for your help.

I do like your approach, however it doesn't work in all cases (I probably should have given different/extra examples). On few occasions I have a partial refund (in this case I'd like the refund to be deleted, the amount of the refund taken away from the closest positive amount within the number), refund which is a sum of previous payments and sum of two refunds is an amount. Oh and where I say refund I mean negative amount. Easier to explain on the below example. Is there a way of amending your code so it considers all below cases? (thanks again!!!)

I don't need to keep obs, it is here only for explanation purposes.

OBS      Number      amount         count

1           1                45                  1        

2           1                -45                 2

3           1                45                  3

4           2                 30                1

5           2                  25               2   

6           2               -30                 3

7           2                25                 4

8          3               50                    1

9          3               60                    2

10         3               -17                  3

11          4               100               1

12          4               100               2

13          4               100               3

14          4               -200               4

15          5               80               1

16          5               40               2

17          5               -20               3

18              5          -20               4

And I'd like the results to be:

OBS      Number      amount       

1           1                45                          

5           2                  25                 

7           2                25     

8          3               50                    1

9          3               43                    2      /************************************ partial refund 60-17=43************************/

11          4               100               1       /****************************** obs 12,13 cancel out obs 14***************************/

15          5               80               1        /******************************obs 16 cancel out obs 17 and 18***************************/

;

Thanks for you help!

Ksharp
Super User

Code: Program

data have;
input Number   amount count ;
cards;  
  1 45 1 
  1 -45 2
  1 45 3
  2 30 1
  2 25 2 
  2 -30 3
  2 25 4
  3 50 1
  3 60 2
  3 -17 3
  4 100 1
  4 100 2
  4 100 3
  4 -200 4
  5 80 1
  5 40 2
  5 -20 3
  5 -20 4
;
run;
data temp1 temp2;
set have;
if amount lt 0 then do;
amount=abs(amount);
output temp2;
end;
else output temp1;
run;
proc sort data=temp1;by Number amount;run;
proc sort data=temp2;by Number amount;run;
data x;
ina=0;inb=0;
merge temp1(in=ina) temp2(in=inb);
by Number amount;
if not ina then amount=-amount;
if not ina or not inb;
run;
proc sort data=x;by Number descending count;run;
data want;
set x;
by number;
retain found;
if first.number then do;sum=0;found=0;end;
sum+amount;
if not found and sum gt 0 then do;amount=sum;found=1; end;
if sum gt 0;
drop sum found;
run;
proc sort data=want;by number count;run;
Alex_mcc
Calcite | Level 5

Hi Xia Keshan,

Thanks for your help! Nearly there Smiley Happy Although this code works for majority of cases, it still doesn't work for all of them. Especially when there are other transactions within 'number' following the refund. I was testing my data on your code (below). Also, sometimes when it does work and the 'sum' should be zero, there appears to be observation with 1.421085E-14. Is that due to formatting?

Also for number 7:

  7 840.42 1

  7 14.54 2

  7 753.65 3

  7 31.6 4

  7 -768.19 5

Is there a way of it checking (within a number) for sum = 0, so I'd want

7 840.42 1

7 31.6 4

left (as 14.54+753.65-768.19=0). But if it's not going to work for every single case, then don't worry.

Thanks again!

data have;

input Number   amount count ;

cards;  

  1 45 1 

  1 -45 2

  1 45 3

  2 30 1

  2 25 2 

  2 -30 3

  2 25 4

  3 50 1

  3 60 2

  3 -17 3

  4 100 1

  4 100 2

  4 100 3

  4 -200 4

  5 80 1

  5 35.9 2

  5 188.66 3

  5 -224.56 4

  5 20 5

  6 57.57 1

  6 28.38 2

  6 57.57 3

  6 -85.95 4

  6 28.38 5

  7 840.42 1

  7 14.54 2

  7 753.65 3

  7 31.6 4

  7 -768.19 5

  8 116.39 1

  8 82.09 2

  8 116.39 3

  8 -198.48 4

  8 82.09 5

  8 116.39 6

  8 -198.48 7

;

run;

Ksharp
Super User

Why it is

14.54+753.65-768.19=0

NOT

31.6+753.65-768.19=0

?

Alex_mcc
Calcite | Level 5

Because it doesn't sum up to 0.


31.6+753.65-768.19=17.06 (not 0)

There must have been another payment before the refund

Ksharp
Super User

So what is your output ?

Output can't be ?

7 840.42 1

  7 14.54 2

  7 17.06 3

 

Do you want pick up the most likely value for that one ?

Ksharp
Super User

NO, You can't do that, unless you can enumerate all the combination of positive value .

Alex_mcc
Calcite | Level 5

what I mean is, where there is a negative value (refund) I want that obs to cancel out with others. In case of obs 7

  7 840.42 1

  7 14.54 2

  7 753.65 3

  7 31.6 4

  7 -768.19 5

I'd want to have this left:

7 840.42 1

7 31.6 4

But that's not as important. The code you gave me previously works perfect, apart from when there are other obs withing that number. Any ideas?

Ksharp
Super User

Idea is enumerate all the combination of position value to see if the sum of combination could refund the negative.

But That is too complicated .

First , check the one value to see if it could refund negative value.

Second, check the all the combination of two to see if its sum could refund

Third, check the all the combination of three to see if its sum could refund

...............

Do you really want this ?

Ksharp
Super User

And if all these combination could not refund to zero , what are you going to do ? Negative value minus WHAT ?

Alex_mcc
Calcite | Level 5

If it wouldn't refund to zero then there'd be partial refund. That's why I though to do a loop within a number.

Ksharp
Super User

OBS      Number      amount         count

1           1                95                  1        

2           1                -45                 2

3           1                45                  3

You want this , right ? You don't need to consider the order ?

OBS      Number      amount         count

1           1                95                  1  

Ksharp
Super User

You want this ? You really want this ??

Code: Program.sas

data have;
input Number   amount count ;
cards;  
  1 45 1
  1 -45 2
  1 45 3
  2 30 1
  2 25 2
  2 -30 3
  2 25 4
  3 50 1
  3 60 2
  3 -17 3
  4 100 1
  4 100 2
  4 100 3
  4 -200 4
  5 80 1
  5 35.9 2
  5 188.66 3
  5 -224.56 4
  5 20 5
  6 57.57 1
  6 28.38 2
  6 57.57 3
  6 -85.95 4
  6 28.38 5
  7 840.42 1
  7 14.54 2
  7 753.65 3
  7 31.6 4
  7 -768.19 5
  8 116.39 1
  8 82.09 2
  8 116.39 3
  8 -198.48 4
  8 82.09 5
  8 116.39 6
  8 -198.48 7
;
run;
proc sort data=have;by Number descending amount;run;
proc sql;
select max(n) into : n
from (select count(*) as n from have where amount ge 0 group by Number );
quit;

data temp;
if _n_ eq 1 then do;
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('k');
ha.definedata('Number','amount','k');
ha.definedone();
end;
set have;
by Number;
array v{&n};
array value{&n} _temporary_;
array key{&n} _temporary_;

if first.Number then do;
  n=0;
  ha.clear();
  call missing(of value{*},of key{*});
end;

if amount ge 0 then do;
  n+1;k=n;
  ha.add();
  value{n}=amount;
  key{n}=n;
end;
else do;
   _k=-1;
   do i=1 to 2**dim(v);
   rc=graycode(_k, of v{*});
   sum=0;
   do j=1 to dim(v); 
   sum+v{j}*value{j};
   end;
   dif=round(sum+amount,0.01);
   if Number=8 then put sum= amount= dif= v{*} value{1} value{2} value{3} value{4};
   if dif=0 then do;  
   do k=1 to dim(v);
   if v{k}=1 then ha.remove(key:key{k});
   end;
   found=1;
   leave;
   end;  
   end;
   if not found then output;
   n=0;
   do while(hi.next()=0);
   n+1;
   value{n}=amount;
   key{n}=k;
   end;
end;

if last.Number then do;
do while(hi.next()=0);
  output;
end;
end;
keep Number amount;
run;

data want;
set temp;
by number;
retain found;
if first.number then do;sum=0;found=0;end;
sum+amount;
if not found and sum gt 0 then do;amount=sum;found=1; end;
if sum gt 0;
drop sum found;
run;

Xia Keshan

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 22 replies
  • 1664 views
  • 2 likes
  • 5 in conversation