BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Top_Katz
Quartz | Level 8

Hi!

 

I have a simple problem I'm trying to code in PROC OPTMODEL (SAS 9.4, SAS/OR 14.1 on Linux x64 grid server), instead of my usual method of directly creating the MPS data set for PROC OPTMILP with a SAS data step.  But I'm striking out pretty badly, and thought I'd ask for assistance.

 

I have a data set with three variables, i j c, sorted on i and j (i < j), the i and j pairs are unique, c is the cost for each pair i and j.  I'm choosing a set of least cost pairs, and each i or j value can occur in only one of the chosen pairs (it can't occur on the left for one pair and on the right for another).  So, I want to create an array c[i,j] and binary decision variables v[i,j].  I want to minimize sum over i and j of c[i,j]*v[i,j], I want sum over i and j of v[i,j] to add up to a specified constant number, and I have constraints that sum over i of v[i,j] + sum over k of v[j,k] <= 1 for each j.  I think this should be pretty simple to to do in OPTMODEL, but I haven't come close to getting anything to run.  Any assistance would be appreciated.  Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
RobPratt
SAS Super FREQ

If I understand correctly, the following should do what you want (where I have used 3 as the "specified constant number"):

 

 

proc optmodel;
   set <num,num> EDGES;
   num c {EDGES};
   read data indata into EDGES=[i j] c;
   set NODES = union {<i,j> in EDGES} {i,j};

   var V {EDGES} binary;
   min TotalCost = sum {<i,j> in EDGES} c[i,j] * V[i,j];
   con Cardinality:
      sum {<i,j> in EDGES} V[i,j] = 3;
   con Balance {j in NODES}:
      sum {<i,(j)> in EDGES} V[i,j] + sum {<(j),k> in EDGES} V[j,k] <= 1; 

   solve;
   print V;
   create data outdata from [i j] c V;
quit;

For learning PROC OPTMODEL, you might find this examples book useful.  In particular, this example illustrates a side-constrained network flow problem.

 

View solution in original post

13 REPLIES 13
RobPratt
SAS Super FREQ

If I understand correctly, the following should do what you want (where I have used 3 as the "specified constant number"):

 

 

proc optmodel;
   set <num,num> EDGES;
   num c {EDGES};
   read data indata into EDGES=[i j] c;
   set NODES = union {<i,j> in EDGES} {i,j};

   var V {EDGES} binary;
   min TotalCost = sum {<i,j> in EDGES} c[i,j] * V[i,j];
   con Cardinality:
      sum {<i,j> in EDGES} V[i,j] = 3;
   con Balance {j in NODES}:
      sum {<i,(j)> in EDGES} V[i,j] + sum {<(j),k> in EDGES} V[j,k] <= 1; 

   solve;
   print V;
   create data outdata from [i j] c V;
quit;

For learning PROC OPTMODEL, you might find this examples book useful.  In particular, this example illustrates a side-constrained network flow problem.

 

Top_Katz
Quartz | Level 8
You nailed it, RobPratt! Thank you so much!!
whatkins
Fluorite | Level 6

hey @RobPratt, I am struggling with a simple modification to the above code.  I'm really really new to SAS so please do forgive me if this is something very simple -- also new to the forum;  I've looked all over the web and this example is the closest I've found where the data is in the format which I've got and the problem statement is similar.

 

In my case I have different constraints:  i have destinations(n=15Kish) and origins(100ish)  (i=destinations and j=origins) and must have every i assigned to only one j (that is each customer is assigned to only one distribution site).  The costs are drive times from the distribution hub to the customer and the objective is to minimize the drive times -- in short it's basically assigning a customer to it's associated site with the shortest drive time.  heres the mod i attempted which isn't working for me; sas tells me that the solution is "infeasible" which seems a bit odd to me.  in the simple case of 10 destinations and 3 distribution sites i also get the "infeasible" issue so I'm thinking it's my constraint line which is off.

 

specifically I used the following constraint code:

/*device assigned to only one site*/
  con Balance {i in NODES}:
      sum {<(i),j> in SITEStoDEVICES} V[i,j] = 1;

 

which is a modification of the constraints from the OP. 

 

Without the constraint I get all V's as zero -- I think thats meaning no assignemnts are made which of course minimized drive time to be zero overall.  However, I set this up to ensure C1-C5 are assigned to S1 while C6-C10 are assigned to S2 with non assigned to S3.

 

Please help!  I'm just not understanding something simple here.

 

in short the overall project objective that I'm trying to achieve is

1) assign customers to the associated sites with lowest drive times

2) get the all customers outside of a specified drive time

3) take the group from (2) and determine  where would be the best new sites to have as distribution hubs to ensure all customers are within the specified drive time.

 

next phase would be to optimize on lowest number of distro hubs such that all customers are within a certain drive time but thats outsie the scope of my imediate need.

 

below is the example data/code.

 

data DTMS;
input destination  : $11. origin : $11. DriveTime;
datalines;
C1 S1 .15
C2 S1 .10
C3 S1 .11
C4 S1 .12
C5 S1 .13
C1 S2 5.15
C2 S2 5.10
C3 S2 5.11
C4 S2 5.12
C5 S2 5.13
C1 S2 7.15
C2 S3 7.10
C3 S3 7.11
C4 S3 7.12
C5 S3 7.13
C6  S1 5.15
C7  S1 5.10
C8  S1 5.11
C9  S1 5.12
C10 S1 5.13
C6  S2 .15
C7  S2 .10
C8  S2 .11
C9  S2 .12
C10 S2 .13
C6  S3 6.15
C7  S3 6.10
C8  S3 6.11
C9  S3 6.12
C10 S3 6.13
;


proc optmodel;
set <str, str> SITEStoDEVICES;
num DriveTime {SITEStoDEVICES};
read data DTMS into SITEStoDEVICES = [destination origin] DriveTime;

set NODES = union {<i,j> in SITEStoDEVICES} {i,j};

/*assignment variable*/
var V {SITEStoDEVICES} binary;

/*objective function*/
   min TotalCost = sum {<i,j> in SITEStoDEVICES} DriveTime[i,j] * V[i,j];
 
/*device assigned to only one site*/
  con Balance {i in NODES}:
      sum {<(i),j> in SITEStoDEVICES} V[i,j] = 1;

    solve;

/*   print V;*/
   create data outdata from [i j] DriveTime V;
run;

 

 

 

 

RobPratt
SAS Super FREQ

This doc example is closer to what you want.  A big difference between your problem and the one from @Top_Katz is that you have a bipartite network, and you need to treat the two sets of nodes differently.

 

The following two changes will accomplish your first step:

/*set NODES = union {<i,j> in SITEStoDEVICES} {i,j};*/
set CUSTOMERS = setof {<i,j> in SITEStoDEVICES} i;
set SITES     = setof {<i,j> in SITEStoDEVICES} j;

/*  con Balance {i in NODES}:*/
  con Balance {i in CUSTOMERS}:
      sum {<(i),j> in SITEStoDEVICES} V[i,j] = 1;

But you do not actually need to solve an optimization problem to do that step.  You can just explicitly compute the minimum for each customer separately:

   str argminDriveTime {CUSTOMERS};
   num minDriveTime;
   for {i in CUSTOMERS} do;
      minDriveTime = constant('BIG');
      for {<(i),j> in SITEStoDEVICES} do;
         if minDriveTime > DriveTime[i,j] then do;
            minDriveTime = DriveTime[i,j];
            argminDriveTime[i] = j;
         end;
      end;
      V[i,argminDriveTime[i]] = 1;
   end;
whatkins
Fluorite | Level 6

wow that was quick.  Thank you @RobPratt; much appreciated.  The solution worked for my simple case and I'm figuring it'll do for the full scope case.

 

i see where you're going with the straight up minimum selection for this simple situation.  however, i needed to know how to make the optimization portion work for the second phase of the project which is coming up.

 

the issue i ran into with the referenced example is this portion of the code

   /* distance from customer i to site j */
   num dist {i in CUSTOMERS, j in SITES}
       = sqrt((x[i] - x[j])^2 + (y[i] - y[j])^2);

   read data cdata into CUSTOMERS=[name] x y demand;
   read data sdata into SITES=[name] x y fixed_charge;

specifically I ran into was the fact that my data is vertically strutured with all the customer/site combos with their associated drive times.  The example assumed two data sets, one for customers and one for sites with both having x/y coordinants and the "cost" is distance which is calculable using the euclidian metric; I was able to easily enough chage that to use lat/longs and geo distances but distances doen't equate to drive times.

 

I wasn't sure how to modify the code to do a "group by" sort of thing with a single data frame.

 

I could easily enough get the data into three frames where I had customers, sites, and then drive times in a third frame but wasn't sure how to get the dist variable from the third frame by looking up the customer / site combo; and unfortunantly the drive times is not a calculable metric given the lat/longs of my customers/sites.

 

I would love to see how you'd modify the referenced code to do either

1) a look up for the dist variable from a third data frame

and/or

2) use a single data frame with customer/site combos and a seperate third column with the "dist" or cost variable.

 

being so new SAS for these types of problems has really been fun but challenging.  Your help has been infinately valuable and I thank you again!

 

 

RobPratt
SAS Super FREQ

Glad to help.

 

Your original READ DATA statement is already perfectly suited for the case that the drive times come from a sparse table instead of from a formula.

whatkins
Fluorite | Level 6
good to know. I'll play around with it now that I've seen a working example and should be able to self serve for the needed finale!
whatkins
Fluorite | Level 6

Good Day Rob,

 

As a follow up to this question I am now attempting to place a constraint on the model in which certain sites are known to be built for this facility location problem.  In short, we know from a business persepective specific locations must be built and want the remaining locations to be greefield solutioned.

 

Here's what I've got.

 

Variable in my data frame "REQ_LOC_BY_MGMT_NUM" has 1's where the site is a known location to build and 0's otherwise.

 

I've tried the following:

added to the preamble of the code

num REQ_LOC_BY_MGMT_NUM {SITEStoDEVICES};

 

added to the objective function statements

min TotalDriveTime = sum {<customerName, siteName> in SITEStoDEVICES} DriveTimeMin[customerName, siteName] * V[customerName, siteName];

min totalbuilds = sum {<deviceSiteId, techSiteId> in SITEStoDEVICES} Build[techSiteId];

 

and new constraint

/* must build specific sites */

con mustbuild {deviceSiteId in CUSTOMERS, techSiteId in SITES}:

REQ_LOC_BY_MGMT_NUM[deviceSiteId,techSiteId] <= Build[techSiteId];

 

 

it's throwing the error:

ERROR: The array subscript 'V['xxxxxxxxxxxxx','xxxxxxxxxxxxxxxxx']' is invalid at line 53 column 7.

 

Thoughts on how to make this work?  It's probably me just not understainding the example from here: 

http://go.documentation.sas.com/?docsetId=ormpug&docsetTarget=ormpug_optmodel_examples07.htm&docsetV...

 

Thanks again!

RobPratt
SAS Super FREQ

It sounds to me like you want the following:

num REQ_LOC_BY_MGMT_NUM {SITES};
 
con mustbuild {techSiteId in SITES: REQ_LOC_BY_MGMT_NUM[techSiteId] = 1}:
   Build[techSiteId] = 1;

Or you can use the FIX statement instead of declaring a single-variable constraint:

for {techSiteId in SITES: REQ_LOC_BY_MGMT_NUM[techSiteId] = 1}
   fix Build[techSiteId] = 1;

If these changes don't resolve your errors, please share the full code and data.

whatkins
Fluorite | Level 6

Thanks for the quick reply Rob.  I'm running into errors when trying to read the data "REQ_LOC_BY_MGMT_NUM" into the model initially. 

 

I've placed the statement "num REQ_LOC_BY_MGMT_NUM {SITES};" both before and after the SITES node creation.  in both cases I get an error.  The first case I get the error

ERROR 525-782: The symbol 'SITES' is unknown.

 

and in the second case (placement after the SITES node create)

ERROR 525-782: The symbol 'REQ_LOC_BY_MGMT_NUM' is unknown.

 

Specifically an example of what I've got is here:

 

/* Data Example */

data DTMS;

input destination : $11.

origin : $11.

REQ_LOC_BY_MGMT_NUM : best8.

DriveTime ;

datalines;

C1 S1 0 .15

C2 S1 0 .10

C3 S1 0 .11

C4 S1 0 .12

C5 S1 0 .13

C1 S2 0 5.15

C2 S2 0 5.10

C3 S2 0 5.11

C4 S2 0 5.12

C5 S2 0 5.13

C1 S3 0 7.15

C2 S3 1 7.10

C3 S3 1 7.11

C4 S3 1 7.12

C5 S3 1 7.13

C6 S1 0 5.15

C7 S1 0 5.10

C8 S1 0 5.11

C9 S1 0 5.12

C10 S1 0 5.13

C6 S2 0 .15

C7 S2 0 .10

C8 S2 0 .11

C9 S2 0 .12

C10 S2 0 .13

C6 S3 1 6.15

C7 S3 1 6.10

C8 S3 1 6.11

C9 S3 1 6.12

C10 S3 1 6.13

;

 

proc optmodel;

set <str, str> SITEStoDEVICES;

num DriveTime {SITEStoDEVICES};

read data DTMS into SITEStoDEVICES = [destination origin] DriveTime REQ_LOC_BY_MGMT_NUM;

set CUSTOMERS = setof {<i,j> in SITEStoDEVICES} i;

set SITES = setof {<i,j> in SITEStoDEVICES} j;

num REQ_LOC_BY_MGMT_NUM {SITES};

/*assignment variable*/

var V {SITEStoDEVICES} binary;

var Build {SITEStoDEVICES} binary;

/* objective functions */

min TotalCost = sum {<i,j> in SITEStoDEVICES} DriveTime[i,j] * V[i,j];

min totalbuilds = sum {<deviceSiteId, techSiteId> in SITEStoDEVICES} Build[techSiteId];

 

/*device assigned to only one site*/

con Balance {i in CUSTOMERS}:

sum {<(i),j> in SITEStoDEVICES} V[i,j] = 1;

/* must build sites with assigned locations */

con link {deviceSiteId in CUSTOMERS, techSiteId in SITES}:

V[deviceSiteId,techSiteId] <= Build[techSiteId];

/*must build certain sites*/

con mustbuild {techSiteId in SITES: REQ_LOC_BY_MGMT_NUM[techSiteId] = 1}:

Build[techSiteId] = 1;

/*for {techSiteId in SITES: REQ_LOC_BY_MGMT_NUM[techSiteId] = 1}*/

/* fix Build[techSiteId] = 1;*/

solve;

/* print V;*/

create data outdata from [i j] DriveTime V Build REQ_LOC_BY_MGMT_NUM;

run;

 

whatkins
Fluorite | Level 6

the line

 

var Build {SITEStoDEVICES} binary;

 

should read

var Build {SITES} binary;

 

in the example code.  My apologies for being unclear there.

 

 

 

RobPratt
SAS Super FREQ

You should use a separate data set for REQ_LOC_BY_MGMT_NUM:

data SiteData;
   input site $ REQ_LOC_BY_MGMT_NUM;
   datalines;
S1 0
S2 0
S3 1
;

And then you can read it as follows:

read data SiteData into [site] REQ_LOC_BY_MGMT_NUM;

The Build variable needs to be indexed over SITES:

var Build {SITES} binary;

You have declared two objectives.  By default, only the second one is used.  Because Build depends only on site, I think the following makes more sense:

min totalbuilds = sum {techSiteId in SITES} Build[techSiteId];

You should also create two separate output data sets:

create data outdata from [i j] DriveTime V;
create data outdata2 from [j] Build REQ_LOC_BY_MGMT_NUM;

Or use an explicit [j] index as follows:

create data outdata from [i j] DriveTime V Build[j] REQ_LOC_BY_MGMT_NUM[j];

Finally, replace the RUN with QUIT.

whatkins
Fluorite | Level 6

Thanks for the help Rob.  With the attempt to minimize number of builds based on both minimizing drivetimes and the known locations to build I came up with this.  If this looks off to you let me know but it appears to work with the following data as expected (sites 1, 2, and 3 are built with the only customer assiged to site 3 being C11).

 

proc optmodel;
set <str, str> SITEStoDEVICES;
num DriveTime {SITEStoDEVICES};
read data DTMS into SITEStoDEVICES = [destination origin] DriveTime;

set CUSTOMERS = setof {<i,j> in SITEStoDEVICES} i;
/*set SITES     = setof {<i,j> in SITEStoDEVICES} j;*/

/*assignment variable*/
var V {SITEStoDEVICES} binary;

set <str> SITES;
num REQ_LOC_BY_MGMT_NUM {SITES};
read data SiteData into SITES =  [site] REQ_LOC_BY_MGMT_NUM;

/* build variable*/
var Build {SITES} binary;

/* objective functions */
/*   min TotalCost   = sum {<i,j> in SITEStoDEVICES} DriveTime[i,j] * V[i,j];*/
/*   min totalbuilds = sum {<deviceSiteId, techSiteId> in SITEStoDEVICES} Build[techSiteId];*/

min cost = sum {i in CUSTOMERS, j in SITES} V[i,j] * DriveTime[i,j];
min totalcost = cost + sum{j in SITES} Build[j];
/*min totalbuilds = sum {<deviceSiteId, techSiteId> in SITEStoDEVICES} Build[techSiteId]*DriveTime[deviceSiteId, techSiteId];*/
 
/*device assigned to only one site and every device is assigned*/
con Balance {i in CUSTOMERS}:
      sum {<(i),j> in SITEStoDEVICES} V[i,j] = 1;

/* must build sites */
con mustbuild {techSiteId in SITES: REQ_LOC_BY_MGMT_NUM[techSiteId] = 1}:
   Build[techSiteId] = 1;

/* must build sites with assigned locations */
con link {deviceSiteId in CUSTOMERS, techSiteId in SITES}:
      V[deviceSiteId,techSiteId] <= Build[techSiteId];

/* do not assign customer to site more than x units away */
con distance_at_most {i in CUSTOMERS, j in SITES: DriveTime[i,j] > 1.99}:
      V[i,j] = 0;

/*    solve;*/
solve with milp/timetype=real;

create data outdata from [i j] DriveTime V Build[j] REQ_LOC_BY_MGMT_NUM[j];

run;

 

 

 

 

data SiteData;
   input site $ REQ_LOC_BY_MGMT_NUM;
   datalines;
S1 0
S2 0
S3 1
S4 0
;


 

 

data DTMS;
input destination  : $11. 
      origin : $11. 
	  REQ_LOC_BY_MGMT_NUM : best8.
      DriveTime ;
datalines;
C1 S1 0 .15
C2 S1 0 .10
C3 S1 0 .11
C4 S1 0 .12
C5 S1 0 .13
C6  S1 0 2.15
C7  S1 0 2.10
C8  S1 0 2.11
C9  S1 0 2.12
C10 S1 0 2.13
C11 S1 0 1.99
C1 S2 0 2.75
C2 S2 0 2.70
C3 S2 0 2.71
C4 S2 0 2.72
C5 S2 0 2.73
C6  S2 0 .15
C7  S2 0 .10
C8  S2 0 .11
C9  S2 0 .12
C10 S2 0 .13
C11 S2 0 2.99
C1 S3 1 7.15
C2 S3 1 7.10
C3 S3 1 7.11
C4 S3 1 7.12
C5 S3 1 7.13
C6  S3 1 6.15
C7  S3 1 6.10
C8  S3 1 6.11
C9  S3 1 6.12
C10 S3 1 6.13
C11 S3 1 1.05
C1 S4 0 0.15
C2 S4 0 0.10
C3 S4 0 0.11
C4 S4 0 0.12
C5 S4 0 7.13
C6  S4 0 0.15
C7  S4 0 0.10
C8  S4 0 0.11
C9  S4 0 7.12
C10 S4 0 7.13
C11 S4 0 8.79
;

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 13 replies
  • 3975 views
  • 4 likes
  • 3 in conversation