Solved: Re: An interesting OR problem

Ksharp · Posted 03-12-2016 12:51 AM

Here is an very interesting operation research problem .

http://stackoverflow.com/questions/35863766/find-the-minimal-list-from-a-many-to-many-mapping

I want know how to use SAS/OR to solve this.

I make it a little bit more complicated.

all = {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20} ;

A 1,3,5,7,9,19,12
B 2,4,6,8,10,16
C 3,6,8,12,17,1,10
D 5,13,20,15,16
E 4,9,15,19,5,14
F 11,2,4,15,1,14,10
G 20,13,15,4,5,6,9
H 19,18,17,11,15,11
I 2,14,6,16,3,11,7
J 1,12,3,13,4,14,18
K 7,17,8,18,9,19,20,4

Here is my GA solution.Don't cheat .

data x;
input name $ v  $40.;
cards;
A 1,3,5,7,9,19,12
B 2,4,6,8,10,16
C 3,6,8,12,17,1,10
D 5,13,20,15,16
E 4,9,15,19,5,14
F 11,2,4,15,1,14,10
G 20,13,15,4,5,6,9
H 19,18,17,11,15,11
I 2,14,6,16,3,11,7
J 1,12,3,13,4,14,18
K 7,17,8,18,9,19,20,4
;
run;
data have;
 set x;
 do i=1 to countw(v,',');
  value=input(scan(v,i,','),best32.);
  output;
 end;
 drop i v;
run;
data key;
 set x(keep=name rename=(name=l));
run;

proc iml;

all = {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20} ;


use have nobs nobs;
read all var {name value};
close have;
use key;
read all var {l};
close key;

group_n=t(loc(name^=t(remove(name,1)||{' '})))-
        t(loc(name^=t({' '}||remove(name,nobs)))) +1;
all_n=countunique(all);
n=countunique(name);
encoding=repeat({0,1},1,n);

start func(x) global(value,group_n,all_n); 
 temp=setdif(t(repeat(x,group_n))#value,{0});
 if countunique(temp)=all_n then sum=sum(x);
  else sum=99999;
 return (sum);
finish;

id=gasetup(2,n,1234);
call gasetobj(id,0,"func");
call gasetcro(id,0.95,2);
call gasetmut(id,0.95,3);
call gasetsel(id,100,1,1);
call gainit(id,1000,encoding); 
niter = 100000;
summary = j(niter,2);
mattrib summary [c = {"MinCount", "Avg Count"} l=""];
do i = 1 to niter;
 call garegen(id);
 call gagetval(v, id);
 summary[i,1] = v[1];
 summary[i,2] = v[:];
end;
call gagetmem(mem, v, id, 1);

Memebers=l[loc(mem)];
print "Best Members:" Memebers[l=""],
      "Min Count:   " v[l = ""] ;
call gaend(id);
quit;

OUTPUT:

Best Members: C G I K
Min Count: 4

RobPratt · Posted 03-12-2016 12:11 PM

proc optmodel;
   set <str,num> NAMES_VALUES;
   read data have into NAMES_VALUES=[name value];
   set NAMES  = setof {<i,v> in NAMES_VALUES} i;
   set VALUES = setof {<i,v> in NAMES_VALUES} v;

   var Select {NAMES} binary;
   min NumSelected = sum {i in NAMES} Select[i];
   con Cover {v in VALUES}:
      sum {<i,(v)> in NAMES_VALUES} Select[i] >= 1;

   solve;
   print {i in NAMES: Select[i].sol > 0.5} Select;
quit;

[1] Select
C 1
D 1
F 1
K 1

This is known as a "set covering problem" in the literature.

View solution in original post

RobPratt · Posted 03-12-2016 12:11 PM

proc optmodel;
   set <str,num> NAMES_VALUES;
   read data have into NAMES_VALUES=[name value];
   set NAMES  = setof {<i,v> in NAMES_VALUES} i;
   set VALUES = setof {<i,v> in NAMES_VALUES} v;

   var Select {NAMES} binary;
   min NumSelected = sum {i in NAMES} Select[i];
   con Cover {v in VALUES}:
      sum {<i,(v)> in NAMES_VALUES} Select[i] >= 1;

   solve;
   print {i in NAMES: Select[i].sol > 0.5} Select;
quit;

[1] Select
C 1
D 1
F 1
K 1

This is known as a "set covering problem" in the literature.

PGStats · Posted 03-12-2016 02:52 PM

Did you modify @Ksharp's input (to correct the anomaly in set H) ? I get C-G-H-I with SAS/STAT 13.1 .

PG

RobPratt · Posted 03-12-2016 03:41 PM

No, I just used the "have" data set as is. The READ DATA statement automatically removed the duplicate:

WARNING: Duplicate key <H,11> was read at observation 51.

By the way, you can use the CLP solver to find all optimal solutions:

   solve with CLP / findallsolns;
   for {s in 1.._NSOL_} put ({i in NAMES: Select[i].sol[s] > 0.5});

It turns out that there are four in this case:

{'C','G','H','I'}
{'C','D','I','K'}
{'C','G','I','K'}
{'C','D','F','K'}

How did you solve it with SAS/STAT?

PGStats · Posted 03-12-2016 03:53 PM

I used SAS/OR 9.1, sorry. Yet, it didn't pick the same answer as yours. Is there something non-deterninistic in the algorithm?

PG

rogerjdeangelis · Posted 03-12-2016 07:06 PM

Hi Team

I will be posting more detail on the SAS-L list, But it has 15 lines in a single data step(after removing checking arrays)?

There appear to be 4 solutions, kind of a lark so be suspecious. I learn more from my errors then my solutions.

see below

Here is one of them

3=c
4=d
6=f
11=k

Others are below.

The solution may not scale over 54 codes without using extended precision, no limit of number of vectors.

The only change from a previous post on SAS-L was to substitute 'bor' for 'sum'

Obs TOT E1 E2 E3 E4 D1 D2 D3 D4

1 2097150 136522 1155104 52246 1966992 3 4 6 11
2 2097150 136522 1155104 84172 1966992 3 4 9 11
3 2097150 136522 1090160 952320 84172 3 7 8 9
4 2097150 136522 1090160 84172 1966992 3 7 9 11

data x;
cost= 2**1+2**3+2**5+2**7+2**9+2**19+2**12 ; output;
cost= 2**2+2**4+2**6+2**8+2**10+2**16 ; output;
cost= 2**3+2**6+2**8+2**12+2**17+2**1+2**10 ; output;
cost= 2**5+2**13+2**20+2**15+2**16 ; output;
cost= 2**4+2**9+2**15+2**19+2**5+2**14 ; output;
cost= 2**11+2**2+2**4+2**15+2**1+2**14+2**10 ; output;
cost= 2**20+2**13+2**15+2**4+2**5+2**6+2**9 ; output;
cost= 2**19+2**18+2**17+2**11+2**15 ; output;
cost= 2**2+2**14+2**6+2**16+2**3+2**11+2**7 ; output;
cost= 2**1+2**12+2**3+2**13+2**4+2**14+2**18 ; output;
cost= 2**7+2**17+2**8+2**18+2**9+2**19+2**20+2**4; output;
run;

all = {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20} ;

%macro ngetk(n=11);
/* generate combinations of n items taken k at a time */
data Comb(keep=tot res d: e: where=(tot=2097150));
retain tot 0 res e1-e11 d1-d11;
array chksum[11] _temporary_ (529066,66900,136522,1155104,574000,52246,1090160,952320,84172,290842,1966992);
array d[11] d1-d11 (11*0);
array e[11] e1-e11 (11*0);
%do k=1 %to 11;
array c&k.f[&k] (&k.*0); /* initialize array to 0 */
ncomb = comb(&n, &k); /* number of combinations */
do j = 1 to ncomb;
rc = lexcombi(&n, &k, of c&k.f[*]);
do l=1 to &k;
tot=bor(tot, chksum[c&k.f[l]]);
d[l]=c&k.f[l];
e[l]=chksum[c&k.f[l]];
end;
res=put(tot,hex8.);
output;
tot=0;
end;
%end;
run;
%mend ngetk;

%ngetk;

Ksharp · Posted 03-12-2016 08:26 PM

Roger,

I noticed you used lexcombi(). That means you can't handle large scale problem(i.e. have lots and lots of student) .

RobPratt · Posted 03-12-2016 08:22 PM

The algorithm is deterministic in the sense that if you run it twice with the same settings on the same machine, you will get the same solution. But, in each new release of SAS/OR, changes in the underlying simplex algorithm, presolver, heuristics, cutting planes, branching strategies, etc. can alter which solution gets returned.

PGStats · Posted 03-12-2016 10:53 PM

I guess this means that the solution returned might depend on the number of threads involved in the search.

PG

RobPratt · Posted 03-12-2016 11:56 PM

Yes, the solution can depend on the number of threads used.

Ksharp · Posted 03-12-2016 08:23 PM

Impressed. I want know how many students OR can handle ?

PG,

Yes. H is a typo . Sorry.

RobPratt · Posted 03-12-2016 08:53 PM

The number of students it can handle depends on the number of courses, how much memory you have, how long you're willing to wait, and how close to optimal you want to get. Set covering is NP-hard, but in practice the MILP approach should definitely scale much better than brute-force.

acordes · Posted 10-18-2017 09:20 AM

Fantastic kskarp. Thanks a lot.

You make me dive deeply into the GA subject as I am having a hard time to understand your code.
But finally I've come behind it and I would like to share my insights for others who might be struggling to understand this code.

As the link to the problem description is broken, I start with my auto-explanation of what is the task here.
Find a minimum set of persons (variable:names) so that their attributes (here numbers from 1 to 20, vector:value) represent the range of the population (vector:all).

your code in black

group_n=t(loc(name^=t(remove(name,1)||{' '})))-
t(loc(name^=t({' '}||remove(name,nobs)))) +1; /*Create a count of values for each person. The "data have" step earlier ensures that it gets sorted by person. It cost me a hell to figure out what it does. It's a nice trick but for me it's easier to understand to get this via this loop:

u = unique(name); /* 2. Unique values (levels) of categorical variable. */
s = j(ncol(u),1); /* 3. Allocate vector to hold results */
do i = 1 to ncol(u); /* 4. For each level... */
idx = (name=u[i]); /* 5. indicator matrix */
s[i] = t(idx)*idx;/* 6. count */
end;

encoding=repeat({0,1},1,n); /*The encoding has ones and zeros and its size is equal to number of unique persons. One selects that person meanwhile 0 leaves it out from the selection.

For the sake of understanding I created an random selection vector and played around:

call randseed(123); /* set random number seed */

u = j(1,nrow(group_n),1); /* allocate */

call randgen(u, "BERNOULLI",0.1);

repeat(u,group_n); /* Create ones and zeros as of the size of the vector of attributes. First element A=1 second element A=3...
This puts zeros (or ones) to all person's elements.

t(repeat(x,group_n))#value /* The 1/0 vector gets multiplied elementwise with the attributes values vector prior transpose.

temp=setdif(t(repeat(x,group_n))#value,{0}); /*The setdif function cancels out zeros and leaves only unique entries. If 10 as value appears several times, then the setdif resulting vector has only 1 times 10.

if countunique(temp)=all_n then sum=sum(x); /*If and only if all values are present in the current selection then count the sum of persons chosen.
else sum=99999; /*As it's a minimization problem, else set sum to a very high value to penalize.

Ksharp · Posted 10-18-2017 09:49 AM

Congratulations!

Glad you could understand my GA code.

Enjoy Genetic Algorithm, you can concur any of optimal problem via it.

I know @RobPratt might don't think so . But I love GA .

Registration is open