Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- SAS Communities Library
- /
- Automatic Linearization Using the OPTMODEL Procedure: Least Absolute D...

Options

- RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content

- Article History
- RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content

Views
508

In my previous blog, the focus was on building an Ordinary Least Squares (OLS) regression model using SAS's algebraic modeling language, the OPTMODEL procedure, which is designed for building and solving optimization models. In this blog, we'll focus on another type of linear regression model called Least Absolute Deviation (LAD).

Unlike OLS regression, which minimizes the sum of the squared residuals, LAD regression minimizes the sum of the absolute value of the residuals. LAD is considered a more robust regression model since residuals are penalized linearly (as opposed to quadratically in OLS).

The tradeoff for increased robustness is a larger residual variance, assuming the error distribution is normal. Depending on the circumstances, LAD is often a preferred alternative to OLS when known outliers are present in the data.

Within the model formulation, the only difference between LAD and OLS is in the objective function.

The objective function for LAD is shown below, which minimizes the sum of the absolute value of the residuals:

Select any image to see a larger version.

Mobile users: To view the images, select the "Full" version at the bottom of the page.

In a perfect world, we could simply modify the objective function from the OLS formulation, re-run the model, and output the new results. However, the problem is that LAD regression is more computationally difficult to solve than OLS due to the presence of the absolute value function, which introduces non-smoothness into the optimization problem. This "non-smoothness", for lack of a better term, makes it difficult to find a closed-form solution, often resulting in convergence issues for the optimization algorithm, causing it to iterate indefinitely and ultimately failing to find the optimal solution.

But don't take my word for it. Let's try it out.

Below are the set, parameter, and decision variable declarations copied over from the OLS formulation in my previous blog, along with the

```
proc optmodel;
set <str> PLAYERS;
set IVARS = /'nHits' 'nBB'/;
num y{PLAYERS};
num x{PLAYERS,IVARS};
read data sashelp.baseball into PLAYERS=[Name] y=nRuns {k in IVARS} <x[Name,k]=col(k)>;
var Intercept;
var Beta{IVARS};
```

The new LAD objective function below is the only modification made from the original OLS model in the previous blog. SAS programmers will recognize the **abs() **function is being used to represent absolute value.

`min Obj = sum{i in PLAYERS} abs(y[i] - (sum{k in IVARS} Beta[k]*x[i,k] + Intercept));`

The **solve** statement is specified below. The **create data **statement creates an output data set of the observed and predicted values for each player. Recall that when you specify the default or standalone **solve** statement, OPTMODEL will determine and apply the appropriate solver from a set of optimization solvers including LP, MILP, NLP, and QP.

```
solve;
create data work.lad_output from [player] =
{i in PLAYERS} y pred=(Intercept + sum{k in IVARS} Beta[k]*x[i,k]);
quit;
```

The results below show the NLP solver was called, and due to the reasons mentioned above, the iteration limit was reached without arriving at an optimal solution.

Well that didn't work! What do we do now? Well... we increase the iteration limit of course! From the Solution Summary table, the default maximum iteration limit is shown to be 5,000. Let's boost that up to 100,000! To do this, I'll modify the

`solve with nlp/maxiter=100000;`

Both the log and Solution Summary table indicate the model failed to converge. Interestingly, the process suspended after 53 iterations, suggesting that simply increasing the number of iterations isn't going to help solve this problem.

At this point, you may be tempted to throw up your hands, say "to heck with it", and just use OLS instead. But stay with me. There's an easy workaround.

In fact, there's another solver option within the OPTMODEL procedure that will quickly and easily solve LAD to optimality. That option is the **solve linearize **option.
**Linearization **in optimization is a powerful technique used to reformulate nonlinear functions with simpler linear functions, allowing us to solve previously unsolvable problems. By replacing complex nonlinear relationships (e.g., absolute value, etc.) with its linear equivalent formulation, optimization algorithms can be efficiently applied to solve the problem.
**solve linearize **statement in the OPTMODEL procedure contains numerous linearization techniques to automatically reformulate nonlinear models into their linear equivalent form. This means that we, as optimization practitioners*,* **do not need to be experts or memorize all of the linearization techniques to easily take advantage of them! **

In other words, linearization simplifies the optimization process by transforming nonlinear objective functions (or constraints) into their linear equivalents, making them amenable to the analytical and numerical techniques required for finding optimal solutions.

The

Performing automatic linearization of our LAD model is as simple as modifying the solve statement and re-running the program:

`solve linearize;`

The Solution Summary table now shows the linear programming (**LP**) solver was called using the default **Dual Simplex** algorithm. For the first time, we see the solution status is **Optimal **with a solution time of 0.02 seconds.

Printing the decision variables (i.e., parameter estimate values) to the Results Viewer shows the new model estimates.

Below is a comparison to the OLS estimates from the previous blog:

OLS | LAD | |

Intercept | -4.233 | -4.861 |

nBB | 0.3008 | 0.3077 |

nHits | 0.4299 | 0.4336 |

Problem solved. The **solve linearize **statement automatically reformulated the nonlinear LAD model into its linear equivalent and quickly solved it to optimality. The output data set provides us with the observed and predicted values for each player in the sashelp.baseball data set, allowing us to further construct any of the desired plots and model diagnostics of our choosing.
**solve linearize **statement did behind the scenes that allowed our LAD model to solve to optimality.

As a result, I had originally intended to conclude the blog here. However, I'd argue that half the fun is learning what the

Quick side story: I used to love visiting magic shops as a kid. I was always enamored with all of the seemingly impossible magic tricks performed by the owner to entice us to purchase the trick. They would perform the magic trick for you, but if you wanted to learn how it worked, you had to purchase it. Only after you purchased it would they show you the secret(s) and teach you to perform it yourself. Even to this day, I love visiting magic shops, and The Prestige is, still to this day, one of my all time favorite movies.

However, unlike magic tricks, the

In order to reverse engineer what this one instance of

```
solve linearize;
expand / linearize;
```

This will write the reformulated linear model to the Results Viewer.

To keep the

```
read data sashelp.baseball(where=(Team='Atlanta' and substr(Name,1,1) in ('H','M')))
into PLAYERS=[Name] y=nRuns {k in IVARS} <x[Name,k]=col(k)>;
put PLAYERS=;
```

Running the

The **expand/linearize **output shows the OPTMODEL procedure added a set of decision variables for each player: **_ADDED_VAR_[1], [2], ... , [5]**. These are the absolute values of the residual for each player.

Two sets of constraints are added to the model for each player,

Let's now replicate this new model formulation using the OPTMODEL procedure.

First I'll create a new set of decision variables,

`var Z{PLAYERS};`

According to the

`min Obj = sum{i in PLAYERS} Z[i];`

For the two sets of constraints in the

```
con y_larger{i in PLAYERS}:
Z[i] >= y[i] - (sum{k in IVARS} Beta[k]*x[i,k] + Intercept);
con yhat_larger{i in PLAYERS}:
Z[i] >= - y[i] + (sum{k in IVARS} Beta[k]*x[i,k] + Intercept);
```

Take the first player in the reduced set, Bob Horner, as an example. Bob's observed value (**nRuns**) is 70, denoted as **y[i] **in the OPTMODEL code. Suppose the model predicts a value for Bob * less than *70. Let's say it predicts 65. The residual would be +5 (i.e., 70 - 65).

In the first constraint, Z[i] >= 5, and in the second constraint, Z[i] >= -5. The smallest value Z[i] is allowed to take while satisfying both constraints is 5, which is the absolute value of Bob's residual.

If the model instead predicts a value for Bob

This new model formulation is the linear equivalent to the original formulation containing the nonlinear objective function, and with the help of the

I hope you found this blog useful and intuitive, and I encourage you to explore all of the automatic linearization techniques packed into the

See below for the complete list of programs discussed above and try them out for yourself!

--------------------------------------------------------------------------------------------------------------------------

Nonlinear (original) model formulation with and without the **solve linearize; **and **expand/linearize; **statements:

```
proc optmodel;
set <str> PLAYERS;
set IVARS = /'nHits' 'nBB'/;
num y{PLAYERS};
num x{PLAYERS,IVARS};
read data sashelp.baseball into PLAYERS=[Name]
y=nRuns {k in IVARS} <x[Name,k]=col(k)>;
*read data sashelp.baseball(where=(Team='Atlanta' and substr(Name,1,1) in ('H','M')))
into PLAYERS=[Name] y=nRuns {k in IVARS} <x[Name,k]=col(k)>;
var Intercept;
var Beta{IVARS};
min Obj = sum{i in PLAYERS} abs(y[i] - (sum{k in IVARS} Beta[k]*x[i,k] + Intercept));
solve;
*solve with nlp/maxiter=100000;
*solve linearize;
*expand / linearize;
create data work.lad_output from [player] =
{i in PLAYERS} y pred=(Intercept + sum{k in IVARS} Beta[k]*x[i,k]);
quit;
```

Linear formulation:

```
proc optmodel;
set <str> PLAYERS;
set IVARS = /'nHits' 'nBB'/;
num y{PLAYERS};
num x{PLAYERS,IVARS};
read data sashelp.baseball into PLAYERS=[Name]
y=nRuns {k in IVARS} <x[Name,k]=col(k)>;
var Intercept;
var Beta{IVARS};
var Z{PLAYERS};
min Obj = sum{i in PLAYERS} Z[i];
con y_larger{i in PLAYERS}:
Z[i] >= y[i] - (sum{k in IVARS} Beta[k]*x[i,k] + Intercept);
con yhat_larger{i in PLAYERS}:
Z[i] >= - y[i] + (sum{k in IVARS} Beta[k]*x[i,k] + Intercept);
solve;
create data work.output from [player] =
{i in PLAYERS} y z pred=(Intercept + sum{k in IVARS} Beta[k]*x[i,k]);
quit;
```

**Available on demand!**

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

Data Literacy is for **all**, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.

Article Labels

Article Tags

- Find more articles tagged with:
- GEL