Re: All possible Combination

Tj_chua · Posted 05-13-2016 12:30 AM

Hi,

Im trying to Create all possible combinations for 66C8.

%let n=66;
%let k=8;

data Comb1 (keep=c1-c&k);

array c[&k] (&k.*0); /* initialize array to 0 */
ncomb = comb(&n, &k); /* number of combinations */
do j = 1 to ncomb;
rc = lexcombi(&n, &k, of c[*]);

end;
run;

Unfortunately my hardward is unable to do this much. i also dont have proc IML.

can any one suggets a quick work around?

Reeza · Posted 05-13-2016 12:43 AM

That's 7.4 billion (7,392,009,768) records...

If you want a dataset with all combinations no matter what you do that's a frickin big data set. The optimization would probably be on what you plan to do with it next, as is creating a big dataset is creating a big dataset.

You can use the CALL ALLCOMB routine, if you really want to do it.

http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a003112302.htm

The example is pretty straightforward. You could write it to a text file for use later, if that helps with the space issues?

Tj_chua · Posted 05-13-2016 06:09 AM

abit less than that but yes the problem is its too big,

the previous code i used did use lexicomp which i think is a bit similar but i got I/O Issues.

each number from 1 to 66 is assigned a value and we want to see sum of the 8 values modulo 8 then see frequency of the remainder

FreelanceReinh · Posted 05-13-2016 07:13 AM

Hi @Tj_chua,

With your data step you'll end up with only one observation: c1=59, c2=60, ..., c8=66. This is manageable in terms of size. On my workstation the step took 90 seconds. So, even run time should not be an issue. What kind of I/O issues did you experience?

Just as an example, I computed the frequencies of the remainders modulo 8 of sum(of c[*]), which took less than five minutes. (Indeed, comb(66,8) is only about 5.7E9.) Also, it should be possible to exploit, e.g., the fact that

mod(a+b, 8) = mod(mod(a, 8)+mod(b, 8), 8)

for all integers a, b.

With this kind of mathematical reasoning you might need less brute force computing.

Astounding · Posted 05-13-2016 08:51 AM

Even in the worst case scenario where nothing else seems to work, you can program this yourself:

do i1=1 to 59;

do i2=i1+1 to 60;

do i3=i2+1 to 61;

do i4=i3+1 to 62;

do i5=i4+1 to 63;

do i6=i5+1 to 64;

do i7=i6+1 to 65;

do i8=i7+1 to 66;

* sum up c{i1} + c{i2} + ... + c{i8};

end; end; end; end; end; end; end; end;

It's clumsier, but if you're careful there won't be any accidents.

FreelanceReinh · Posted 05-15-2016 07:42 PM

Hi @Tj_chua,

I've just written a draft program which, according to preliminary checks, can solve your problem in 0.1 seconds (as opposed to about 4:45 minutes for the brute force approach).

You mentioned 66 "values", which I assumed to be integers. The input for the below program is just the frequency distribution of the remainders modulo 8 of these 66 values. (Obviously, program code could be added to calculate this quickly.) The frequencies of remainders 0, 1, ..., 7 are the initial values of temporary array m.

As an example, I took the values 1, 2, ..., 66. Eight of them have remainder 0, nine have remainder 1 etc.

data comb0;
array m[0:7] _temporary_ (8 9 9 8 8 8 8 8);
array n[0:7] _temporary_;
array k[8];
do k1=0 to 7;
  do k2=k1 to 7;
    do k3=k2 to 7;
      do k4=k3 to 7;
        do k5=k4 to 7;
          do k6=k5 to 7;
            do k7=k6 to 7;
              do k8=k7 to 7;
                r=mod(sum(of k[*]),8);
                f=1;
                do i=0 to 7;
                  n[i]=0;
                  do j=1 to 8;
                    n[i]+(k[j]=i);
                  end;
                  if m[i]>=n[i] then f=f*comb(m[i],n[i]);
                  else do;
                    f=0;
                    leave;
                  end;
                end;
                if f then output;
              end;
            end;
          end;
        end;
      end;
    end;
  end;
end;
drop i j k:;
run;

proc summary data=comb0 nway;
class r;
var f;
output out=want(drop=_:) sum=;
run;

proc print data=want noobs;
sum f;
run;

The trick is to partition the set of comb(66, 8)=5.7 billion combinations into comb(8+8-1, 8)=6435 subsets in which the target function r is constant and whose cardinalities f can be calculated by means of elementary combinatorics.

More explanations will follow tomorrow (CEST).

Tj_chua · Posted 05-15-2016 10:34 PM

hi @FreelanceReinh

this looks good but im not that familiar to arrays,

would you be able to help me to do the following.

so there are 66 possible items

1-8 is assiged with value of 1

13-16 is assiged with value of 2

21-24 is assiged with value of 3

25-32 is assiged with value of 4

33-40 is assiged with value of 5

41-48 is assiged with value of 6

49-56 is assiged with value of 7

57-64 is assiged with value of 8

65&66 is assiged with value of 0.3

we want to take 8 a at a time and see the frequency of remainder.

The catch is, no combination of items can be repeated. (combination not permutation, meaning each item can be drawn only once in each combination.)

would it be easy to tweak the codes below for this purpose?

Thank you so much!

FreelanceReinh · Posted 05-16-2016 06:22 AM

The 0.3 will require special consideration. What values are assigned to items 9-12 and 17-20?

We are talking about 8-element subsets of a 66-element set. So, your requirement that no item should be repeated within a subset is satisfied.

Tj_chua · Posted 05-16-2016 10:42 AM

@FreelanceReinh ops i missed a few there

1-8 is assiged with value of 1

9-16 is assiged with value of 2

17-24 is assiged with value of 3

25-32 is assiged with value of 4

33-40 is assiged with value of 5

41-48 is assiged with value of 6

49-56 is assiged with value of 7

57-64 is assiged with value of 8

65&66 is assiged with value of 0.3

Thanks!

FreelanceReinh · Posted 05-16-2016 10:57 AM

No problem. I already guessed the missing values.

Meanwhile I have adapted my program to the new information:

data comb0;
array m[0:8] _temporary_ (8 8 8 8 8 8 8 8 2);
array n[0:8] _temporary_;
array l[8] _temporary_;
array k[8];
do k1=0 to 8;
  do k2=k1 to 8;
    do k3=k2 to 8;
      do k4=k3 to 8;
        do k5=k4 to 8;
          do k6=k5 to 8;
            do k7=k6 to 8;
              do k8=k7 to 8;
                do _n_=1 to 8;
                  l[_n_]=ifn(k[_n_]=8, 0.3, k[_n_]);
                end;
                r=mod(sum(of l[*]),8);
                f=1;
                do i=0 to 8;
                  n[i]=0;
                  do j=1 to 8;
                    n[i]+(k[j]=i);
                  end;
                  if m[i]>=n[i] then f=f*comb(m[i],n[i]);
                  else do;
                    f=0;
                    leave;
                  end;
                end;
                if f then output;
              end;
            end;
          end;
        end;
      end;
    end;
  end;
end;
drop i j k:;
run;

proc summary data=comb0 nway;
class r;
var f;
output out=want(drop=_:) sum=;
run;

proc print data=want noobs;
sum f;
run;

Total run time is still <0.1 s, although the number of observations in dataset COMB0 has increased from 6435 to 11583. But this is still much better than 5.7 billion.

I am going to add explanations to this post within the next few hours. Just wanted to deliver the code first. In parallel I run the program using the brute force approach, which I have also amended to match the new specifications. Estimated run time: 30 min.

Edit: The explanations have become a bit lengthy. I'll put them into a separate post to avoid technical issues.

FreelanceReinh · Posted 05-16-2016 01:42 PM

Explanations

Array m holds the frequencies of remainders (always modulo 😎 0, 1, 2, ..., 7, 0.3. So, the first "8" in the list of initial values (8 8 8 8 8 8 8 8 2) is the number of items which have been assigned a value whose remainder is 0. (This refers to items 57-64.) Another eight of the 66 items (namely items 1-8) have been assigned a value whose remainder is 1, hence the second "8" in the list, and so on. Finally, two items (65 and 66) have been assigned a value whose remainder is 0.3. This is reflected by the "2" at the end of the list.
The equation I mentioned in my first response in this thread holds also for all real numbers a, b (algebra). This is why only the remainders of the original values need to be considered. Moreover, since addition is commutative, the remainder of the sum of any eight values taken from the set of 66 depends only on the frequency distribution of the remainders in the 8-element subset. During program execution array n is populated with such frequency distributions, where n[0] is the frequency of remainder 0, n[1] the frequency of remainder 1, ..., n[7] the frequency of remainder 7 and n[8] is the frequency of remainder 0.3, exactly corresponding to array m. In particular, 0<= n[i] <= 8.
Array l is used to store the remainders of the 8-element subsets we consider. These are derived from array k (see below).
Array k consists of eight index variables (k1-k8), which can take values 0, 1, ..., 8, representing remainders 0, 1, ..., 7, 0.3. Thanks to the algebraic considerations above, we can restrict our attention to 8-element sets with distinct frequency distributions of remainders. These frequency distributions correspond 1:1 to selections of 8 elements from the 9 available remainders (0, 1, ..., 7, 0.3) where repetitions are allowed, but order is not significant. The total number of such selections is comb(8+9-1, 8)=12870, according to a formula from elementary combinatorics. The eight nested DO loops with index variables k1, ..., k8 run through these 12870 combinations, from (0,0,0,0,0,0,0,0) to (8,8,8,8,8,8,8,8) with the important restriction k1<=k2<=...<=k8.
The values of k1, ..., k8 are mapped to the remainders l[1], ..., l[8], where 8 is mapped to 0.3, while 0, ..., 7 are just copied from k[q] to l[q] (q=1, ..., 8).
The remainder of the sum l[1]+...+l[8] is stored in variable r.
In the final dataset WANT, variable f contains the frequencies of the remainders. In the data step we compute these frequencies separately for each of the 12870 (theoretically) possible frequency distributions of remainders in an 8-element subset. Since f is computed as a product, we initialize it as 1.
In each iteration of the DO loop from i=0 to 8 we consider one of the remainders 0, 1, 2, ..., 7, 0.3. First, we determine the frequency of the respective remainder in the combination (l[1], ..., l[8]). It's convenient to use array k for the incrementation of array elements n[i].
Now it's time for one more combinatorial consideration: Let's say, the frequency distribution of remainders, stored in array n, for the current iteration of the eight nested "k-loops" is (2, 1, 0, 0, 3, 0, 0, 0, 2). How many of the comb(66, 8)=5,743,572,120 subsets would show this pattern? Well, the 2 items with remainder 0 must have been taken from the m[0]=8 items with this remainder. There are comb(8, 2)=28 significantly different ways to select 2 out of 8. Similarly, there are comb(8, 1)=8 ways to select 1 out of the m[1]=8 items with remainder 1. As these selections can be combined with each of the previous 28 selections, the two binomial coefficients need to be multiplied. In the end, the total number of subsets matching a given frequency distribution of remainders is the product of the nine binomial coefficients comb(m[i], n[i]), i=0, ..., 8. For the abovementioned example this product is 28*8*1*1*56*1*1*1*1=12544.
A single i with n[i]>m[i] would create a factor zero in the product. (Actually, the SAS function COMB would return a missing value in this case, which is a bit inconvenient.) Therefore, we set f=0 and leave the DO loop (i=0 to 😎 as soon as this happens.
Combinations of remainders with non-zero frequencies f are written to dataset COMB0 (if f then output;). As it turns out, not all of the 12870 hypothetical combinations of remainders can occur with our data: Remainder 0.3 can in fact appear at most twice in an 8-element subset of our 66 items. As a consequence, the number of observations in COMB0 is only 11583 with our specific input data.
The PROC SUMMARY step adds the frequencies for each remainder r to form the aggregated dataset WANT with one observation per remainder. Here, we have 24 different remainders: 0, 0.3, 0.6, 1, 1.3, 1.6, ..., 7, 7.3, 7.6.
The final result, obtained in <0.1 seconds (well, not counting the time for program development and documentation :-)), is identical to the result of a brute-force approach (involving a DO loop with 5.7 billion iterations), which took 31:46 minutes on my workstation.

Tj_chua · Posted 05-16-2016 10:30 PM

Hi @FreelanceReinh,

I think i explain something wrong, we want to form groups of 8s with no remainders, so( 6 6 6 6 6 6 6 6), there are no eights that can be formed. while if we get (2 6 6 6 6 6 6 6) 1 eight can be formed.

This is greate! is it also possible to get below

1)let x - how many 8 was formed,

2)let y - how many items formed the remainder/ Was not used? istead of the remainder it self, preffer lest amount of item possible.

3)let z - how many 0.3 in the combination.

Example:

Combination x Y Z

4 4 4 2 2 1 3 .3 => 2 eights formed, 3 Items remained. remainder = 1. count for .3 items is 1

.3 .3 3 5 1 2 5 8=> 3 eights formed, 2 Items remained. remainder = .6. count for .3 items is 2

4 4 3 5 1 2 5 8=> 4 eights formed, 0 Items remained. remainder = 0. count for .3 items is 0

4 4 4 4 1 2 1 1 => 4 eights formed, 2 Items remained. remainder = 5. count for .3 items is 0
X Y ZFreq
1 0 0 ???

1 1 1 ???

1 2 2 ???

.

Is this possible with the array?

Thank you so much, appriciate the help.

ill try to learn the explanation code.

FreelanceReinh · Posted 05-17-2016 12:27 PM

Hi @Tj_chua,

It seems that your specifications change faster than I can implement them :-).

Here is the implementation of the specs of 2016-05-17 08:54 AM UTC, which were:

@Tj_chua wrote:

Hi @FreelanceReinh,

This is greate! is it also possible to get distribution of bellow
1)let x - how many 8 was formed, i think round(sum/8) will do.
2)let y - how many items formed the remainder? istead of the remainder it self, preffer lest amount of item possible.
3)let z - how many 0.3 in remainder.

Example:
Combination
4 4 4 2 2 1 3 .3 => 2 eights formed, 3 Items remained. remainder = 1. count for .3 items is 1
.3 .3 3 5 1 2 5 8=> 3 eights formed, 2 Items remained. remainder = .6. count for .3 items is 2
4 4 3 5 1 2 5 8=>   4 eights formed, 0 Items remained. remainder = 0. count for .3 items is 0
4 4 4 4 1 2 1 1 =>   4 eights formed, 2 Items remained. remainder = 5. count for .3 items is 0
X Y ZFreq
1 0 0 ???
1 1 1 ???
1 2 2 ???
.
.

Thank you so much, appriciate the help.

ill try to learn the explanation code.

%macro ymac;
%local i y;
%do y=2 %to 7;
  t[1]=0;
  call missing(of s[*]);
  ncomb=comb(8, &y);
  do j=1 to ncomb;
    call allcombi(8, &y, of %do i=1 %to &y; t[&i] %end;);
    do h=1 to &y;
       s[h]=v[t[h]];
    end;
    if round(sum(of s[*]),.1)=r then do;
      lflag=1;
      leave;
    end;
  end;
  if lflag then return;
  y=y+1;
%end;
%mend ymac;

data comb0;
array m[0:8] _temporary_ (8 8 8 8 8 8 8 8 2);
array n[0:8] _temporary_;
array l[8] _temporary_;
array v[8] _temporary_;
array k[8];
array s[8] _temporary_;
array t[8] _temporary_;
do k1=0 to 8;
  do k2=k1 to 8;
    do k3=k2 to 8;
      do k4=k3 to 8;
        do k5=k4 to 8;
          do k6=k5 to 8;
            do k7=k6 to 8;
              do k8=k7 to 8;
                do _n_=1 to 8;
                  l[_n_]=ifn(k[_n_]=8, 0.3, k[_n_]);
                  v[_n_]=ifn(k[_n_]=0, 8, l[_n_]);
                end;
                r=round(mod(sum(of l[*]),8),.1);
                f=1;
                do i=0 to 8;
                  n[i]=0;
                  do j=1 to 8;
                    n[i]+(k[j]=i);
                  end;
                  if m[i]>=n[i] then f=f*comb(m[i],n[i]);
                  else do;
                    f=0;
                    leave;
                  end;
                end;
                if f then do;
                  x=floor(sum(of v[*])/8);
                  lflag=0;
                  if r=0 then y=0;
                  else if r in v then y=1;
                  else do;
                    y=2;
                    link callymac; 
                  end;
                  if y=8 & sum(of v[*]) ne r then y=.N;
                  z=n[8];
                  output;
                end;
              end;
            end;
          end;
        end;
      end;
    end;
  end;
end;
return;
callymac:
  %ymac
return;
drop h i j k: l: ncomb;
run;

/* Use PROC FREQ for aggregation, thus get percentages without additional effort */

proc freq data=comb0 noprint;
weight f;
tables r / out=want;
tables x*y*z / missing out=want2;
run;

proc print data=want;
sum count percent;
run;

proc print data=want2;
sum count percent;
run;

Explanations

Macro YMAC supports the calculation of y.
Array v holds the eight values corresponding to an 8-element subset of the 66 items.
Array s is used to store values of 2-, 3-, ..., 7-element subsets of one of the 8-element subsets. These "second-order subsets" are created if y not in (1, 2) until one of them yields the remainder as the sum of its elements. The cardinality of the first such subset found is then y.
Array t holds the indices specifying the subsets stored in array s.
Although redundant on my workstation, I inserted the ROUND function into the definition of r in order to avoid potential (platform-dependent) numeric representation issues.
New variables x, y and z are calculated only if frequency f>0.
Variable lflag ("leave flag") helps to leave the nested loops in which y is calculated.
y is calculated as follows:
1. If the remainder is 0, then y=0.
2. If one of the 8 values (v[1], ..., v[8]) equals the remainder, y=1.
3. Otherwise, the 2-, 3-, ..., 7-element second-order subsets are searched (see item 3 above).
4. If the sum of none of these second-order subsets equals the remainder, y is preliminarily set to 8, but then set to special missing value .N ("none") if also the sum of all 8 values does not equal the remainder. (This includes many cases where some of the values are equal to 8, though.)

FreelanceReinh · Posted 05-17-2016 12:47 PM

@Tj_chua wrote:

Hi @FreelanceReinh,

I think i explain something wrong, we want to form groups of 8s with no remainders, so( 6 6 6 6 6 6 6 6), there are no eights that can be formed. while if we get (2 6 6 6 6 6 6 6) 1 eight can be formed.

This is greate! is it also possible to get below

1)let x - how many 8 was formed,

2)let y - how many items formed the remainder/ Was not used? istead of the remainder it self, preffer lest amount of item possible.

3)let z - how many 0.3 in the combination.

Example:

Combination x Y Z

4 4 4 2 2 1 3 .3 => 2 eights formed, 3 Items remained. remainder = 1. count for .3 items is 1

.3 .3 3 5 1 2 5 8=> 3 eights formed, 2 Items remained. remainder = .6. count for .3 items is 2

4 4 3 5 1 2 5 8=>   4 eights formed, 0 Items remained. remainder = 0. count for .3 items is 0

4 4 4 4 1 2 1 1 =>   4 eights formed, 2 Items remained. remainder = 5. count for .3 items is 0
X Y ZFreq
1 0 0 ???

1 1 1 ???

1 2 2 ???

.

.

Is this possible with the array?

Thank you so much, appriciate the help.

ill try to learn the explanation code.

Your latest specifications abandon the concept of calculating sums and remainders modulo 8, which was fundamental to the development done so far. Nevertheless, there is still hope that much of the existing program could continue to be used. This is based on two facts:

the one-to-one correspondence between the values (v) and their remainders modulo 8
the invariance of the variables (to be) derived under variations of the 8-element subsets with fixed frequency distribution of values.

Unfortunately, the newly requested variable x is not well-defined.

Example:
values
(1 1 2 3 4 5 6 7) --> x=2?
(1 1 2 3 4 5 6 7) --> x=3?

Each of our 11583 eight-element subsets can be partitioned in 4140 different ways (Bell number B8). So, even if one had to search through all 11583*4140=47,953,620 partitions to determine the maximum (or minimum) number of eights formed (i.e., a well-defined version of x), this should be feasible.

I reckon that SAS/OR provides better tools for performing such searches (in an optimized way), but I don't have a license, nor experience with this SAS module.

Therefore, I recommend that you open a new thread with updated specifications (all derived variables must be well-defined) and an adapted title. Several "SAS/OR challenges" have been answered very quickly in the past couple of months, for example this one.

Good luck! And thank you for posting such interesting questions for mathematically inclined programmers.