Hello - This is a continuation from an earlier post I had. Again, thank you in advance for any help you can provide me. I am trying to group smaller subsets of a larger population that net (or sum if you will) down to $0.00. My data has limited available fields, so I can only use the numeric field to complete this. In my earlier post, it looked like a "brute force" option was the best choice for me. I am hoping the community can help me improve upon the logic that was provided to me. Big thanks to all the earlier contributors, my knowledge of arrays, while very limited, has grown over the past few days. A few notes... 1) Yes, the overall population does net to $0.00, but I need to try to group it into smaller subsets for the purpose of my analysis (new data attached) 2.) I DO NOT have SAS/OR, so I cannot use the OPTMODEL procedure. I am running on SAS EG. I can provide a list of the licenses I do have if that helps. 3.) One thing I noticed when testing the script below on a new data set is that if there's a repeating value sequentially, the code may add the original value, plus itself when looping. If possible, I'd like to avoid scenarios like this. For example, with {-3,-3,6} this code is adding the first element twice and then summing to zero with the 6, instead of taking elements 1 and 2. 4.) For some additional background, the populations I'm dealing with can go up to 6000 lines. I tested one of these larger populations for sets of 3. It worked but took 5 hours to process. I'm ok with the processing time, but am curious if there efficiencies which can applied, especially since I could have multiple larger datasets. 5.) Example code I've been playing with is below. Special thanks to @ballardw for writing this originally. My questions to the community... a.) can the logic below be updated so that the array containing the values "shrinks" after subsets have been made? I think this may reduce the number of iterations. b.) along the same idea as question a., is it possible for the array to not sum the same element against itself? this may reduce the number of iterations as well (though not by much). simply put, each element can only be used once when summing c.) does this code need to be sequentially run the way it is currently set up? say I want to search for subsets of 50 lines, do I need 50 steps going sequentially from 2-50, or can it be set up to run just the 50 lines matches first, and within the results are the subsets of 2, 3,4...n. Thank you all again for you help. This has been very insightful so far and I've been learning a lot. If there are any other questions I can answer, or background I can provide to add additional clarity, please let me know. proc transpose data=have out=work.trans
prefix=ba
;
var base_amt;
run;
data work.pairs ( keep=iout jout ival jval)
work.leftover (keep=ba:)
;
set work.trans end=last;
array b ba:;
do i= 1 to (dim(b)-1);
do j= 2 to dim(b);
if sum(b[i],b[j])= 0 then do;
if not missing(b[i]) then iout=i;
if not missing(b[j]) then jout=j;
ival = b[i];
jval = b[j];
output work.pairs;
call missing(b[i],b[j],iout,jout);
leave;
end;
end;
end;
if last then output work.leftover;
run;
data work.trios ( keep=iout jout kout ival jval kval)
work.leftover2 (keep=ba:)
;
set work.leftover end=last;
array b ba:;
do i= 1 to (dim(b)-2);
do j= 2 to (dim(b)-1);
do k= 3 to dim(b);
if sum(b[i],b[j],b[k])= 0 then do;
if not missing(b[i]) then iout=i;
if not missing(b[j]) then jout=j;
if not missing(b[k]) then kout=k;
ival = b[i];
jval = b[j];
kval = b[k];
output work.trios;
call missing(b[i],b[j],b[k],iout,jout,kout);
leave;
end;
end;
end;
end;
if last then output work.leftover2;
run;
... View more