Multivariate Adaptive Regression Splines (Friedman, 1991) is a nonparametric technique that combines regression splines and model selection methods. It is a powerful predictive modeling tool because 1) it extends linear models to analyze nonlinear dependencies 2) it produces parsimonious models that do not overfit the data and thus have good predictive power. Multivariate adaptive regression splines construct spline basis functions in an adaptive way by automatically selecting appropriate knot values for different variables. This can help E-miners to identify linear and nonlinear variables, and the interactions of them as well. When excluding higher order terms, multivariate adaptive regression splines are really good at identifying the effects of single variables in a multivariate setting. This makes it highly usable in process control and for identifying experimental designs. Multivariate adaptive regression splines also has its application in forecasting as a variable screening tool.
It has always been a desirable tool for our E-miners and now you have multivariate adaptive regression splines as an extension node in Enterprise Miner by just following a few simple steps.
Once deployed, you can find the Multivariate Adaptive Regression Splines node under the Applications tab.
Multivariate Adaptive Regression Splines Node Requirements
One or more input variables are required for the Multivariate Adaptive Regression Splines node. The data set can contain at most one target variable, either interval or categorical.
If the input data set contains a frequency variable, the frequency variable must be an interval variable and all observations must be positive integers.
Multivariate Adaptive Regression Splines Node Properties
Drag a Multivariate Adaptive Regression Splines node onto an open diagram, and you will see the property panel as shown in Figure 2.
Figure 2: Multivariate Adaptive Regression Splines node properties panel
Here are the descriptions of main properties.
Multivariate Adaptive Regression Splines Node Example
This example uses the sample SAS data set SAMPSIO.HMEQ. You must use the data set to create a SAS Enterprise Miner Data Source. Right-click the Data Sources folder in the Project Navigator and select Create Data Source to launch the Data Source wizard.
Drag the HMEQ data set and the Multivariate Adaptive Regression Splines node to your diagram workspace. Connect them as shown in the diagram below.
Select the button next to the Keep Effects property to open a term editor. Specify variable Job to be included in the final model as shown in the diagram below, and then click OK.
Run the Multivariate Adaptive Regression Splines node with other settings as default by right-clicking on the Multivariate Adaptive Regression Splines node and selecting Run. In the Confirmation window, select Yes. After a successful run of the Multivariate Adaptive Regression Splines node, select Results in the Run Status window.
Notice the following information:
Bases Transformation Information is a table of the transformations that are used to generate the basis matrix. The first basis function, Basis0, is the intercept. The second basis function, Basis1, is 1 when variable Job has level ‘Sales’ and 0 otherwise. The eleventh basis function, Basis11, is Loan - 40800 when loan > 40800 and 0 otherwise, and 40800 here is a knot value. Other basis functions are constructed in a similar manner by using other knot values. The knots are chosen automatically.
Parameter Estimates is a table of parameter estimates and the selected variables.
Backward Selection Iteration is a plot displays the progression of the backward elimination phase. The GCV criterion provides an estimate of how well the model will perform with new data, so the final model should have good predictive power. The figure below shows that the backward elimination step eliminates basis functions 13, 10, and 11.
ANOVA is an Analysis of Variance (ANOVA) table for the target variable.
Classification Variables is a table of classification variable levels information.
Fit Control Parameters is a table of parameters of spline fitting controls.
Fit Statistics is a table of the fit statistics from the model.
Model Information is a table of Multivariate Adaptive Regression Splines model settings.
Variable Importance is a table of input variables, scaled by their relative importance as predictors for the target variable.
Dependent Variable vs. Fitted Values is a plot displays the raw dependent variable overlaid with the fitted values. This plot is not produced for dependent variable with nonnormal distribution.
Residuals vs. Fitted Values is a plot displays the residuals overlaid with the fitted values. This plot is not produced for dependent variable with nonnormal distribution.
**Note: Special thanks to Paal Navestad, Senior Data Scientist @ ConocoPhillips for providing valuable feedbacks on this article.
Hi,
I am trying to add this extention to my EM; however, I have some difficulties with it as I don't know where I should save the files. the steps provided in the “SAS® Enterprise Miner™ 14.1 Extension Nodes: Developer’s Guide”are not clear for me. I would appreciate it if someone have some clear steps that I can follow.
Thanks for your time in advance.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.