Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Programming
- /
- Programming
- /
- Re: Net Groups of Lines in a Table to $0.00

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 06-12-2018 12:40 AM
(1609 views)

Hello - thanks in advance for any help. I have a data table with 300+ records. I'm trying to identify combinations of lines that net to $0.00 automatically. Is there a way to do this within SAS? Thank you again for any input you may have. A simplified example is below...

1/ -5

2/ 2

3/ 3

4/ -10

5/ 4

6/ 6

In the above example, rows 1-3 and 4-6 would get grouped together since they net to $0.00 each. (-5 + 2 + 3) and (-10 + 4 + 6). Is there a way to identify these combinations automatically within SAS?

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I'm not sure yet whether it is optimal, but the following greedy heuristic (which tries to partition the observations into a maximum number of groups) yields 86 groups:

```
proc optmodel;
set OBS_ALL;
num x {OBS_ALL};
read data have into OBS_ALL=[_N_] x=BASE_AMT;
set OBS init OBS_ALL;
/* Select[i] = 1 if observation i is selected to be in the current group; 0 otherwise */
var Select {OBS} binary;
/* selected observations sum to 0 */
con SumToZero:
sum {i in OBS} x[i] * Select[i] = 0;
/* select at least one observation */
con Nonempty:
sum {i in OBS} Select[i] >= 1;
/* minimize number selected */
min NumSelected = sum {i in OBS} Select[i];
num group {OBS_ALL};
set SELECTED = {i in OBS: Select[i].sol > 0.5};
num g init 0;
/* greedy heuristic */
do until (_solution_status_ = 'INFEASIBLE' or card(OBS) = 0);
/* call MILP solver */
solve;
put SELECTED=;
g = g + 1;
for {i in SELECTED} group[i] = g;
OBS = OBS diff SELECTED;
end;
create data want(drop=i) from [i] x group;
quit;
```

The resulting group size frequencies are:

group_size | group_size_freq |
---|---|

1 | 6 |

2 | 50 |

3 | 11 |

4 | 12 |

5 | 5 |

8 | 1 |

115 | 1 |

That is, 6 singletons (the ones with BASE_AMT = 0), 50 pairs, 11 triples, ..., and one big group of 115.

17 REPLIES 17

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

1. Please change the title of your question to something meaningful such as "Combination of numbers that sum to a target value" to increase the chances of getting an answer.

2. I am not aware of any such feature in SAS, though SAS/IML can probably find such combinations.

3. I found these solutions using Excel and C++ in case it helps.

http://www.k2e.com/tech-update/tips/147-using-excel-to-identify-entries-that-add-to-a-specific-value

https://leetcode.com/problems/combination-sum/description/

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Why doesn't all 6 get grouped together, all in they all sum up to 0? Need to define your logic a bit better. Anyways, a retained value should be enough - not not tested, post test data in the form of a datastep and what the output should look like!

data want; set have; retain grp sm; if _n_=1 then do; grp=1; sm=val; end; else do; sm=sum(sm,val); if sm=0 then grp=grp+1; end; run;

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Interesting...

I assume that you only want a series of subsequent lines that add to 0?

But how to group? there can still be many possibilities. I did an example program like this:

```
data have;
do i=1 to 1000;
x=ceil(rand('UNIFORM')*21)-11;
output;
end;
run;
data groups;
set have nobs=nobs;
sum=0;
do _P_=_N_ to nobs;
set have(keep=x rename=(x=x2)) point=_P_;
sum+x2;
if sum=0 then do;
shortest_group=_P_-_N_+1;
leave;
end;
end;
keep i x shortest_group;
run;
```

So that for each line, you get the length of the shortest group that can start here. It is easy to see that a lot of them overlap; which ones would you choose for creating your group variable? Do you want the largest number of (non-overlapping) groups possible, or do you want to have as many lines as possible inside groups, or something else?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I fully agree with **RW9** and **s_lassen** that the specifications should be more precise. Restricting the groups to *consecutive* records would indeed avoid substantial complexity which otherwise might require SAS/OR capabilities.

The possibility of overlapping groups has already been mentioned (like if in your own example 4 was replaced by 7).

If this is real-world data, aren't there other variables indicating likely or impossible groupings?

What if a group of values happen to sum to zero by chance?

In any case, make sure you avoid numeric representation issues if you are dealing with non-integer values.

Example:

```
data caveat;
input amt;
s+amt;
if s=0 then put 'This might NOT occur!';
if round(s,.001)=0 then put 'Rounding helps!';
cards;
4.90
1.90
-6.80
;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hello All - First off, __ thank you__ for the responses. I've updated to subject line to better reflect my question. I was working on two items late last night and apparently grouped both thoughts into this thread by mistake. A little more information on my question...

-Yes, the file nets down to $0.00 naturally as a whole. That said, we are trying to get it down into smaller subsets.

-Unfortunately, there are no additional character fields to utilize in order to group down the data smaller. We are left only with the amount field.

-Yes, it is plausible that some lines may group together to form a bigger cluster (like my example, 6 lines, versus splitting into 2 groups of 3). This is somewhat unlikely based on how the amounts are created though.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I am afraid you have to resort to SAS/OR .

Post it at OR forum and calling @RobPratt

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Do the values have to be grouped in sequence?

How do you expect the result to look for your given example?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

No, the data does not (and often will not be) grouped in sequence.

Honestly, the results will vary. I've manually gone through a few iterations (I have several files to do this on with 300+ records). I had one table that nearly all rows paired off and an equal net value (ex: +100 / -100) and another where I was able to find 5/300 rows that netted and the remainder had to be made into one big group.

I am working on Enterprise Guide, should have mentioned that earlier.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Can you please share a representative data set, or maybe the most difficult one?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

All -as requested, here's a dataset. It's just a single column of amounts.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@adornodj wrote:

All -as requested, here's a dataset. It's just a single column of amounts.

Your data just threw another curve: What do you want with 0 values? Since 0 doesn't actually affect sums very much do they require special treatment? Such as all zero values in one "group" or a maximum of one zero per "group" or any other rules?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I'm not sure yet whether it is optimal, but the following greedy heuristic (which tries to partition the observations into a maximum number of groups) yields 86 groups:

```
proc optmodel;
set OBS_ALL;
num x {OBS_ALL};
read data have into OBS_ALL=[_N_] x=BASE_AMT;
set OBS init OBS_ALL;
/* Select[i] = 1 if observation i is selected to be in the current group; 0 otherwise */
var Select {OBS} binary;
/* selected observations sum to 0 */
con SumToZero:
sum {i in OBS} x[i] * Select[i] = 0;
/* select at least one observation */
con Nonempty:
sum {i in OBS} Select[i] >= 1;
/* minimize number selected */
min NumSelected = sum {i in OBS} Select[i];
num group {OBS_ALL};
set SELECTED = {i in OBS: Select[i].sol > 0.5};
num g init 0;
/* greedy heuristic */
do until (_solution_status_ = 'INFEASIBLE' or card(OBS) = 0);
/* call MILP solver */
solve;
put SELECTED=;
g = g + 1;
for {i in SELECTED} group[i] = g;
OBS = OBS diff SELECTED;
end;
create data want(drop=i) from [i] x group;
quit;
```

The resulting group size frequencies are:

group_size | group_size_freq |
---|---|

1 | 6 |

2 | 50 |

3 | 11 |

4 | 12 |

5 | 5 |

8 | 1 |

115 | 1 |

That is, 6 singletons (the ones with BASE_AMT = 0), 50 pairs, 11 triples, ..., and one big group of 115.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Your problem is to partition the observations into zero-sum groups. If the objective is to maximize the number of groups, it is clear that you want to treat each singleton (BASE_AMT = 0) as its own group, because any group that contains more than one observation and includes a 0 can be split into two groups, each with zero sum. Less obvious is the fact that you can treat each pair x and -x as their own group, without loss of optimality. There are two cases to consider, given any partition of the observations into zero-sum groups:

1. If a group contains more than two observations and includes a pair x and -x, you can split off the pair into its own group {x,-x}, preserving zero sums, and thereby increase the number of groups by one.

2. If x and -x appear in two different groups, say G1 and G2, you can reorganize as two groups {x,-x} and (G1 diff {x}) union (G2 diff {-x}), each with zero sum, and thereby preserve the number of groups. Explicitly, G1 diff {x} sums to -x (because G1 sums to 0), and G2 diff {-x} sums to x, so their union sums to -x + x = 0.

This means that the first two rounds of the greedy algorithm are optimal in the sense that removing singletons and pairs does not ruin your chances of maximizing the number of groups.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Here is an example of a brute force not terribly elegant using only data step code approach and has the potential of not actually working depending on your actual data. This is an example of find the pairs that total 0, then find the groups of 3 that total 0. The pattern could be continued. This captures both the index (original row number) as well as the values on those rows with one record per group. I did two levels of this.

data have; input BASE_AMT; datalines; -758086.38 -715706.92 -308866.51 -65557.44 -39680 -22150.98 -20967.45 -18659 -13091.9 -8400 -7800 -5902.99 -4900 -3800 -3567.64 -3472 -3201.42 -3108.97 -2462 -2400 -1975 -1975 -1964 -1425.95 -1256.03 -1200 -7492.68 7492.68 -7301.35 7301.35 -1200 -1200 -3346.65 3346.65 -1026.9 -912.5 -912.5 -2862.83 2862.83 -849.57 -2499 2499 -822.87 -781.76 -2357.55 2357.55 -781.76 -2272.4 2272.4 -2060.21 2060.21 -750 -750 -750 -750 -1893.63 1893.63 -1739.47 1739.47 -690.63 -1509.29 1509.29 -1492.8 1492.8 -673.44 -673.44 -1399 1399 -673.44 -673.44 -1226.26 1226.26 -663.34 -663.33 -663.33 -658.53 -971.51 971.51 -945.45 945.45 -600.4 -600 -884.25 884.25 -593.74 -576.89 -566.91 -799.59 799.59 -554.47 -544.28 -526.75 -519.78 -515.62 -510 -510 -500 -692.2 692.2 -500 -469.08 -468.45 -394 -372.26 -369.08 -361.42 -361.42 -347.89 -347.89 -325.4 -318.6 -300 -581.25 581.25 -300 -296.86 -560 560 -288.43 -279.55 -540.45 540.45 -261.84 -524.6 524.6 -255.09 -251.25 -250 -247.08 -226.67 -217 -189.1 -497.2 497.2 -181.82 -485.35 485.35 -181.53 -180.08 -455.1 455.1 -154 -446.77 446.77 -418.2 418.2 -147.16 -384.7 384.7 -146 -137.84 -132.43 -130.62 -129.96 -129 -116.1 -313.01 313.01 -302.95 302.95 -113.75 -104.34 -101.32 -99 -94.73 -280.75 280.75 -90 -87.84 -83.28 -81 -79.77 -79.24 -78.84 -77.71 -73.26 -219.96 219.96 -68.11 -68.09 -216.86 216.86 -64.28 -64.12 -185.6 185.6 -62.91 -62.64 -62.37 174.14 -174.14 -173.85 173.85 -62.35 -59.76 -149.85 149.85 -59.04 -59.01 -56.43 -55.68 -54.5 -54.04 -52.8 -126 126 -120.55 120.55 -52.57 -52.52 -50 -49.36 -47.25 -47.22 -44.56 -43.21 -42 -40.46 -36.3 -88.3 88.3 -35.4 -34.61 -33.42 -31.66 -30.96 -30.67 -29.12 -73.85 73.85 -28.9 -68.72 68.72 -28.24 -28.09 -27.56 -26.23 -25.65 -25.1 -24.44 -24.22 -59.55 59.55 -23.42 -20.74 -57.9 57.9 -19.41 -56.24 56.24 -18.65 -18.41 -17.12 -16.65 -15.72 -14.75 -13.54 -13.38 -12.16 -11.8 -11.7 -11.66 -11.53 -42.28 42.28 -11.45 -11.22 -10.62 -10 -8.9 -34.35 34.35 -8.77 -8.64 -8.44 -7.89 -6.24 -6 -5.64 -5.35 -4.76 -4.52 -4.48 -3.51 -3.41 -2.99 -2.85 -2.65 -2.53 -18.9 18.9 -2.46 -2.37 -1.96 -1.76 -1.63 46.23 61.35 88.78 89.77 92 107.49 167.52 212.34 218.12 229.44 267.52 296.38 451.3 485.73 510 629.13 790.82 799.37 835.03 1329.06 1443.56 1581.69 1928.62 2274.62 2726.63 3000.4 4093.26 5426.75 9014.1 25414.75 43152 44807.36 1906992.59 0 0 0 0 0 0 ; run; proc transpose data=have out=work.trans prefix=ba ; var base_amt; run; data work.pairs ( keep=iout jout ival jval) work.leftover (keep=ba:) ; set work.trans end=last; array b ba:; do i= 1 to (dim(b)-1); do j= 2 to dim(b); if sum(b[i],b[j])= 0 then do; if not missing(b[i]) then iout=i; if not missing(b[j]) then jout=j; ival = b[i]; jval = b[j]; output work.pairs; call missing(b[i],b[j],iout,jout); leave; end; end; end; if last then output work.leftover; run; data work.trios ( keep=iout jout kout ival jval kval) work.leftover2 (keep=ba:) ; set work.leftover end=last; array b ba:; do i= 1 to (dim(b)-2); do j= 2 to (dim(b)-1); do k= 3 to dim(b); if sum(b[i],b[j],b[k])= 0 then do; if not missing(b[i]) then iout=i; if not missing(b[j]) then jout=j; if not missing(b[k]) then kout=k; ival = b[i]; jval = b[j]; kval = b[k]; output work.trios; call missing(b[i],b[j],b[k],iout,jout,kout); leave; end; end; end; end; if last then output work.leftover2; run;

Actually one data step with "large" number of nested do loops should work but the actual number of nested levels will depend on the number of rows of data and the actual values.

The potential of not getting everything assigned to a group comes about when a value that is needed to complete a "complex" group of maybe 6 or 7 values gets used in a smaller set.

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. **Registration is now open through August 30th**. Visit the SAS Hackathon homepage.

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.