BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
PatrickYang
Obsidian | Level 7

To Whom It May Concern:

I am wondering if an objective function in the form of an inequality can be defined for the genetic algorithm (GA) sub-routines in PROC IML?

My objective functions that needs to be minimized looks like this:

minimize v given that (0.5-v) <= summation_i_from_1_to_20 (f_i * x_i) <= (0.5+v), where x_i is the decision variable in the form of 0's and 1's. As you can see, it is an integer problem.

I have seen examples from the SAS documentation on GA in PROC IML where the objective function is an equality. However, my objective function is an inequality. Most likely, most of my constraints are also inequalities. I am wondering if there are ways of handling these inequality objective functions and inequality constraints in the GA subroutines of PROC IML? In fact, it does not have to be the GA. I could also use other optimizers provided in PROC IML as long as the inequality objective function and the inequality constraints can be programmed in PROC IML.

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

I think I might understand now.  f is a vector of parameters. You are trying to find binary x such that  the quantity  f` * x  is as close to 0.5 as possible.

Choose your objective function to be

abs( f` * x - 0.5 )

The formulation is

min abs( f` * x - 0.5 ), where the minimization is over all binary 20-element vectors x.

This following SAS/IML code generates random 0/1 vectors and then finds the one for which f`*x is closest to 0.5.  It doesn't solve your problem because I am sampling from a bernoulli(0.5) distribution, which means that most vectors have about half 0s and half 1s, whereas your GA approach will vary the distribution of x according to the values in f. Still, this code should get you started:

proc iml;

call randseed(1234);

f = {0.1 0.05 0.05 0.33 0.123 0.04 0.008 0.0065}`;

numSamples = 1e4;

x = j(nrow(f), numSamples, 0);

call randgen(x, "Bernoulli", 0.5);

sum = (f` * x)[+, ]; /* objective function */

g = abs(sum - 0.5);  /* closest to target */

minCol = g[>:<];

minX = x[ ,minCol];

minVal = f` * minX;

print minVal, minX;

View solution in original post

11 REPLIES 11
Rick_SAS
SAS Super FREQ

An objective function is a mathematical function. For each input there is one output.  Functions are not expessed by inequalities, although constraints can be.

In your problem, the sum f`*x is a particular number, so isn't the closest integer to it either floor(f`*x) or ceil(f`*x)?

If v has to be an integer, would it make sense to consider an objective function by using the following logic?

sum = f` * x;

diff1 = sum - floor(sum);

diff2 = ceil(sum) - sum;

if diff1 < diff2 then return( floor(sum) );

else return( ceil(sum) );

PatrickYang
Obsidian | Level 7

Dear Dr. Wicklin:

Thank you for your kindly response. I appreciate it very much.

Both v and f are real and they are both positive. Only x is an integer variable (in fact, binary). In my problem above, the targeted value we are trying to reach is 0.5. That is why we want our v to be minimized so that the two limits, (i.e., both 0.5-v and 0.5+v) are as close to 0.5 as possible. I have gone through the documentation on GA subroutines in PROC IML. I find it very difficult to set up an objective function like the one above which has two inequalities in it.

Thank you again.

Rick_SAS
SAS Super FREQ

I think I might understand now.  f is a vector of parameters. You are trying to find binary x such that  the quantity  f` * x  is as close to 0.5 as possible.

Choose your objective function to be

abs( f` * x - 0.5 )

The formulation is

min abs( f` * x - 0.5 ), where the minimization is over all binary 20-element vectors x.

This following SAS/IML code generates random 0/1 vectors and then finds the one for which f`*x is closest to 0.5.  It doesn't solve your problem because I am sampling from a bernoulli(0.5) distribution, which means that most vectors have about half 0s and half 1s, whereas your GA approach will vary the distribution of x according to the values in f. Still, this code should get you started:

proc iml;

call randseed(1234);

f = {0.1 0.05 0.05 0.33 0.123 0.04 0.008 0.0065}`;

numSamples = 1e4;

x = j(nrow(f), numSamples, 0);

call randgen(x, "Bernoulli", 0.5);

sum = (f` * x)[+, ]; /* objective function */

g = abs(sum - 0.5);  /* closest to target */

minCol = g[>:<];

minX = x[ ,minCol];

minVal = f` * minX;

print minVal, minX;

PatrickYang
Obsidian | Level 7

Dear Dr. Wicklin:

Thank you for the second reply which is very detailed. The most important thing is the use of the absolute value when formulating the objective function, which I never thought of previously. PROC IML is very new to me. I have just started reading your Statistical Programming with SAS/IML Software book. The book is still very new. So that means I have not got very far on it. 🙂 However, your response does help me successfully come up with an objective function that is to be minimized.

start objectivetest(x) global(attribute1, attribute2); /*Beginning of objective function*/

summation=sum(attribute1 # x);

target=abs(summation-0.5); /*Formulate the objective function that is to be minimized next*/

return(target);

finish; /*End of objective function*/

The above function is to be minimized below through GA function calls:

id=gasetup(2,20,1234); /*encoding=2,size=20,seed=1234*/

call gasetobj(id,0,"objectivetest"); /*id=id,objtype=0*/

In the above code, attribute1 is used in the objective function. For attribute2, it is supposed to be used as part of a constraint. If you do not mind, can I ask how I can specify a constraint, preferably in an explicit way, which is in the form of inequalities for the GA functions and subroutines in SAS? Say, if I need two constraints: 1) Summation of x (0's and 1's in the solution) is between 5 and 10, 2) summation of the product of (attribute2*x) is between 20 and 40? I have used LINGO to specify these constraints. They can be done in an explicit way. I am wondering if the way such constraints are specified in SAS PROC IML is similar?

Thank you so much!!

Rick_SAS
SAS Super FREQ

Because you want x to be a binary sequence, I would suggest encoding=0.  Currently you are asking for solution vectors that are permutations of 1:20.

It has been a long time since I have programmed a GA algorithm. The documentation for the GA algorithm mentions several ways of handling constraints. I think that I would handle constraints in the crossover/mutation stages as discussed in the second paragraph of the doc page about "Handling Constraints."

This means that when you generate the next generation from the current, make sure that every element of the new generation satisfies the constraint by "repairing" any violations. For example, if sum(x)>10, then locate all the 1s and randomly delete enough to restore the constraint.  Assuming that the elements of attribute2 are all positive, you could perform a similar operation when(sum(attribute#x) is too big or too small.

The last paragraph of the doc mentions the objective penalty method, which was suggested by Xia.

Ksharp
Super User

You can add a penalty factor in your object function ,when it is out of that range , return a very big number like 9999.

if 0.5-v >= summation or summation >= (0.5+v) then return (99999);

  else return ( summation );

PatrickYang
Obsidian | Level 7

Dear Dr. Xia:

Thank you very much for your reply.

So, here is my understanding: 1) If it is a minimization problem, the penalty should be a very large positive number given any infeasibility/infeasibilities and 2) if it is maximization problem, the penalty should be a very small positive number given any infeasibility/infeasibilities. In other words, as long as the fitness value as measured by the objective function is markedly reduced given any infeasibility/infeasibilities, the goal of applying a penalty function to the objective function is reached.

Thank you.

Ksharp
Super User

Yes. You got it . Here is an example:

start function(x) global(Office,Customer,n);

count=ncol(unique(Customer[loc(x)]));

if ncol(unique(Office[loc(x)]))^=n then count=0;

return (count);

finish;

The function is set to be maximize . if the count is not what I want , I will set it to very small ( zero ) ,that means this sample is not I want .

P.S. I am Master degree, not Doctor. while @Rick has Cornell Doctor degree. Smiley Wink

Xia Keshan

Rick_SAS
SAS Super FREQ

After thinking about this some more, I think your problem is a variation on the so-called "knapsack problem." See the example in the GA documentation. The 'f' vector represents the weights of objects that you want to put in the backpack.  The 'v' value says that the backpack should weigh about v kg.  Your first constraint says you need between 5 and 10 items in the backpack. The 'attribute2' constraint says that the cost of the items is between $20 and $40, where the item costs are given by 'attribute2'.

In the classical knapsack problem, the weight is bounded above by some maximum weight. In your variation, you want to get as close to a target weight as possible.

So I withdraw my previous objection to using encoding=2.  Try to follow the doc example, but modify the objective function.

PatrickYang
Obsidian | Level 7

Dear Dr. Wicklin:

Thank you for the two responses above. I think I have got an idea regarding what you are suggesting regarding specifying the constraints. Currently, I am using system-provided mutation and crossover operators as provided in GASETMUT (uniform mutation) and GASETCRO (simple crossover). Based on your recommendations and the information on Page 515 of the documentation, I will create my own mutation and crossover modules where I can specify as many as constraints as I need. There are quite a few that I need to satisfy in my work.

For the encoding issue in the GASETUP function, I originally chose 2 for encoding because the decision variable is a binary variable. Also, the solution has to be fixed in length: What is selected AND what is not. So, I chose the fixed-length integer vector option (i.e., encoding=2). I agree with you that it is a variation of Example 22.3 on Knapsack problem. One major distinction is that in Example 22.3 the maximum weight is a constant, which is easy to handle, whereas, in my problem, the "maximum weight" itself needs to be minimized (i.e., no longer a constant). It is where I find the specification of the objective function confusing. However, based on your reply, I think I have got an idea and will proceed with what I have got from your messages.

Thank you very much for your help!

PatrickYang
Obsidian | Level 7

Dear Dr. Wicklin:

After thinking it over again, I think I made a mistake in my previous message regarding the analogy. In my problem, the limit is also a constant: It is nothing but 0.5. The target is for f*x to approach the limit of 0.5 as closely as possible from TWO directions. Here, the limit of 0.5 corresponds to the maximum weight the backpack can hold in Example 22.3. It is just that, in Example 22.3, it is only in ONE direction. Still, thank you for reminding me of the use of the absolute value operation.

Thanks a lot!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

From The DO Loop
Want more? Visit our blog for more articles like these.
Discussion stats
  • 11 replies
  • 1509 views
  • 9 likes
  • 3 in conversation