BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.

Here is an very interesting operation research problem .

http://stackoverflow.com/questions/35863766/find-the-minimal-list-from-a-many-to-many-mapping

 

I want know how to use SAS/OR to solve this.

I make it a little bit more complicated.

 

all = {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20} ;

A 1,3,5,7,9,19,12
B 2,4,6,8,10,16
C 3,6,8,12,17,1,10
D 5,13,20,15,16
E 4,9,15,19,5,14
F 11,2,4,15,1,14,10
G 20,13,15,4,5,6,9
H 19,18,17,11,15,11
I 2,14,6,16,3,11,7
J 1,12,3,13,4,14,18
K 7,17,8,18,9,19,20,4

 

Here is my GA solution.Don't cheat . 

 

 

 

data x;
input name $ v  $40.;
cards;
A 1,3,5,7,9,19,12
B 2,4,6,8,10,16
C 3,6,8,12,17,1,10
D 5,13,20,15,16
E 4,9,15,19,5,14
F 11,2,4,15,1,14,10
G 20,13,15,4,5,6,9
H 19,18,17,11,15,11
I 2,14,6,16,3,11,7
J 1,12,3,13,4,14,18
K 7,17,8,18,9,19,20,4
;
run;
data have;
 set x;
 do i=1 to countw(v,',');
  value=input(scan(v,i,','),best32.);
  output;
 end;
 drop i v;
run;
data key;
 set x(keep=name rename=(name=l));
run;

proc iml;

all = {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20} ;


use have nobs nobs;
read all var {name value};
close have;
use key;
read all var {l};
close key;

group_n=t(loc(name^=t(remove(name,1)||{' '})))-
        t(loc(name^=t({' '}||remove(name,nobs)))) +1;
all_n=countunique(all);
n=countunique(name);
encoding=repeat({0,1},1,n);

start func(x) global(value,group_n,all_n); 
 temp=setdif(t(repeat(x,group_n))#value,{0});
 if countunique(temp)=all_n then sum=sum(x);
  else sum=99999;
 return (sum);
finish;

id=gasetup(2,n,1234);
call gasetobj(id,0,"func");
call gasetcro(id,0.95,2);
call gasetmut(id,0.95,3);
call gasetsel(id,100,1,1);
call gainit(id,1000,encoding); 
niter = 100000;
summary = j(niter,2);
mattrib summary [c = {"MinCount", "Avg Count"} l=""];
do i = 1 to niter;
 call garegen(id);
 call gagetval(v, id);
 summary[i,1] = v[1];
 summary[i,2] = v[:];
end;
call gagetmem(mem, v, id, 1);

Memebers=l[loc(mem)];
print "Best Members:" Memebers[l=""],
      "Min Count:   " v[l = ""] ;
call gaend(id);
quit;

 

 

OUTPUT:


Best Members: C G I K
Min Count: 4

1 ACCEPTED SOLUTION

Accepted Solutions
RobPratt
SAS Super FREQ
proc optmodel;
   set <str,num> NAMES_VALUES;
   read data have into NAMES_VALUES=[name value];
   set NAMES  = setof {<i,v> in NAMES_VALUES} i;
   set VALUES = setof {<i,v> in NAMES_VALUES} v;

   var Select {NAMES} binary;
   min NumSelected = sum {i in NAMES} Select[i];
   con Cover {v in VALUES}:
      sum {<i,(v)> in NAMES_VALUES} Select[i] >= 1;

   solve;
   print {i in NAMES: Select[i].sol > 0.5} Select;
quit;

[1] Select
C 1
D 1
F 1
K 1

 

This is known as a "set covering problem" in the literature.

View solution in original post

13 REPLIES 13
RobPratt
SAS Super FREQ
proc optmodel;
   set <str,num> NAMES_VALUES;
   read data have into NAMES_VALUES=[name value];
   set NAMES  = setof {<i,v> in NAMES_VALUES} i;
   set VALUES = setof {<i,v> in NAMES_VALUES} v;

   var Select {NAMES} binary;
   min NumSelected = sum {i in NAMES} Select[i];
   con Cover {v in VALUES}:
      sum {<i,(v)> in NAMES_VALUES} Select[i] >= 1;

   solve;
   print {i in NAMES: Select[i].sol > 0.5} Select;
quit;

[1] Select
C 1
D 1
F 1
K 1

 

This is known as a "set covering problem" in the literature.

PGStats
Opal | Level 21

Did you modify @Ksharp's input (to correct the anomaly in set H) ? I get C-G-H-I with SAS/STAT 13.1 .

PG
RobPratt
SAS Super FREQ

No, I just used the "have" data set as is.  The READ DATA statement automatically removed the duplicate:

 WARNING: Duplicate key <H,11> was read at observation 51.

 

By the way, you can use the CLP solver to find all optimal solutions:

 

   solve with CLP / findallsolns;
   for {s in 1.._NSOL_} put ({i in NAMES: Select[i].sol[s] > 0.5});

It turns out that there are four in this case:

{'C','G','H','I'}
{'C','D','I','K'}
{'C','G','I','K'}
{'C','D','F','K'}

 

How did you solve it with SAS/STAT?

PGStats
Opal | Level 21

I used SAS/OR 9.1, sorry. Yet, it didn't pick the same answer as yours. Is there something non-deterninistic in the algorithm?

PG
rogerjdeangelis
Barite | Level 11

Hi Team

 

 I will be posting more detail on the SAS-L list, But it has 15 lines in a single data step(after removing checking arrays)? 

 

There appear to be 4 solutions, kind of a lark so be suspecious. I learn more from my errors then my solutions.

 

see below

 

Here is one of them

 

3=c
4=d
6=f
11=k

 

Others are below.

 

The solution may not scale  over 54 codes without using extended precision, no limit of number of vectors.

 

The only change from a previous post on SAS-L was to substitute 'bor' for 'sum'


Obs TOT E1 E2 E3 E4 D1 D2 D3 D4

1 2097150 136522 1155104 52246 1966992             3 4 6 11
2 2097150 136522 1155104 84172 1966992             3 4 9 11
3 2097150 136522 1090160 952320 84172                 3 7 8 9
4 2097150 136522 1090160 84172 1966992              3 7 9 11

 

data x;
cost= 2**1+2**3+2**5+2**7+2**9+2**19+2**12 ; output;
cost= 2**2+2**4+2**6+2**8+2**10+2**16 ; output;
cost= 2**3+2**6+2**8+2**12+2**17+2**1+2**10 ; output;
cost= 2**5+2**13+2**20+2**15+2**16 ; output;
cost= 2**4+2**9+2**15+2**19+2**5+2**14 ; output;
cost= 2**11+2**2+2**4+2**15+2**1+2**14+2**10 ; output;
cost= 2**20+2**13+2**15+2**4+2**5+2**6+2**9 ; output;
cost= 2**19+2**18+2**17+2**11+2**15 ; output;
cost= 2**2+2**14+2**6+2**16+2**3+2**11+2**7 ; output;
cost= 2**1+2**12+2**3+2**13+2**4+2**14+2**18 ; output;
cost= 2**7+2**17+2**8+2**18+2**9+2**19+2**20+2**4; output;
run;

all = {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20} ;

%macro ngetk(n=11);
/* generate combinations of n items taken k at a time */
data Comb(keep=tot res d: e: where=(tot=2097150));
retain tot 0 res e1-e11 d1-d11;
array chksum[11] _temporary_ (529066,66900,136522,1155104,574000,52246,1090160,952320,84172,290842,1966992);
array d[11] d1-d11 (11*0);
array e[11] e1-e11 (11*0);
%do k=1 %to 11;
array c&k.f[&k] (&k.*0); /* initialize array to 0 */
ncomb = comb(&n, &k); /* number of combinations */
do j = 1 to ncomb;
rc = lexcombi(&n, &k, of c&k.f[*]);
do l=1 to &k;
tot=bor(tot, chksum[c&k.f[l]]);
d[l]=c&k.f[l];
e[l]=chksum[c&k.f[l]];
end;
res=put(tot,hex8.);
output;
tot=0;
end;
%end;
run;
%mend ngetk;

%ngetk;

 

 

Ksharp
Super User

Roger,

I noticed you used lexcombi(). That means you can't handle large scale problem(i.e. have lots and lots of student) .

RobPratt
SAS Super FREQ

The algorithm is deterministic in the sense that if you run it twice with the same settings on the same machine, you will get the same solution.  But, in each new release of SAS/OR, changes in the underlying simplex algorithm, presolver, heuristics, cutting planes, branching strategies, etc. can alter which solution gets returned.

PGStats
Opal | Level 21

I guess this means that the solution returned might depend on the number of threads involved in the search.

PG
RobPratt
SAS Super FREQ

Yes, the solution can depend on the number of threads used.

Ksharp
Super User

Impressed. I want know how many students OR can handle ?

 

 

PG,

Yes. H is a typo . Sorry.

RobPratt
SAS Super FREQ

The number of students it can handle depends on the number of courses, how much memory you have, how long you're willing to wait, and how close to optimal you want to get.  Set covering is NP-hard, but in practice the MILP approach should definitely scale much better than brute-force.

acordes
Rhodochrosite | Level 12

Fantastic kskarp. Thanks a lot.


You make me dive deeply into the GA subject as I am having a hard time to understand your code.
But finally I've come behind it and I would like to share my insights for others who might be struggling to understand this code.

 

As the link to the problem description is broken, I start with my auto-explanation of what is the task here.
Find a minimum set of persons (variable:names) so that their attributes (here numbers from 1 to 20, vector:value) represent the range of the population (vector:all).

 

your code in black

group_n=t(loc(name^=t(remove(name,1)||{' '})))-
t(loc(name^=t({' '}||remove(name,nobs)))) +1;  /*Create a count of values for each person. The "data have" step earlier ensures that it gets sorted by person. It cost me a hell to figure out what it does. It's a nice trick but for me it's easier to understand to get this via this loop:

u = unique(name); /* 2. Unique values (levels) of categorical variable. */
s = j(ncol(u),1); /* 3. Allocate vector to hold results */
do i = 1 to ncol(u); /* 4. For each level... */
idx = (name=u[i]); /* 5. indicator matrix */
s[i] = t(idx)*idx;/* 6. count */
end;

 

encoding=repeat({0,1},1,n); /*The encoding has ones and zeros and its size is equal to number of unique persons. One selects that person meanwhile 0 leaves it out from the selection.

 

For the sake of understanding I created an random selection vector and played around:

call randseed(123);         /* set random number seed */

u = j(1,nrow(group_n),1);                /* allocate */

call randgen(u, "BERNOULLI",0.1); 

 

repeat(u,group_n);  /* Create ones and zeros as of the size of the vector of attributes. First element A=1 second element A=3...
This puts zeros (or ones) to all person's elements.

 

t(repeat(x,group_n))#value /* The 1/0 vector gets multiplied elementwise with the attributes values vector prior transpose.

 

temp=setdif(t(repeat(x,group_n))#value,{0}); /*The setdif function cancels out zeros and leaves only unique entries. If 10 as value appears several times, then the setdif resulting vector has only 1 times 10.

 

if countunique(temp)=all_n then sum=sum(x); /*If and only if all values are present in the current selection then count the sum of persons chosen.
else sum=99999;  /*As it's a minimization problem, else set sum to a very high value to penalize.

Ksharp
Super User

Congratulations!

Glad you could understand my GA code.

Enjoy Genetic Algorithm, you can concur any of optimal problem via it.

I know @RobPratt might don't think so . But I love GA .

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 13 replies
  • 2571 views
  • 9 likes
  • 5 in conversation