Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- SAS Communities Library
- /
- Tip: Fit Multivariate Adaptive Regression Splines in SAS® Enterprise M...

Options

- RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content

- Article History
- RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content

Views
5,900

Multivariate Adaptive Regression Splines (Friedman, 1991) is a nonparametric technique that combines regression splines and model selection methods. It is a powerful predictive modeling tool because 1) it extends linear models to analyze nonlinear dependencies 2) it produces parsimonious models that do not overfit the data and thus have good predictive power. Multivariate adaptive regression splines construct spline basis functions in an adaptive way by automatically selecting appropriate knot values for different variables. This can help E-miners to identify linear and nonlinear variables, and the interactions of them as well. When excluding higher order terms, multivariate adaptive regression splines are really good at identifying the effects of single variables in a multivariate setting. This makes it highly usable in process control and for identifying experimental designs. Multivariate adaptive regression splines also has its application in forecasting as a variable screening tool.

It has always been a desirable tool for our E-miners and now you have multivariate adaptive regression splines as an extension node in Enterprise Miner by just following a few simple steps.

- Download all the files from the Github repository (https://github.com/sassoftware/dm-flow/tree/master/MARS), including a XML file (MARS.xml) defining the node properties, a SAS catalog (emextn.sas7bcat), and two GIF files (MARS_16.gif and MARS_32.gif) for the node icon.
- To deploy the extension node, you need to follow the steps as instructed in Chapter 5 “Deploying an Extension Node” in “SAS® Enterprise Miner™ 14.1 Extension Nodes: Developer’s Guide”.
- After store the files in the proper directories, restart the Enterprise Miner server if necessary.
- The
**Multivariate Adaptive Regression Splines**extension node runs with SAS Enterprise Miner 13.1 or any later version.

Once deployed, you can find the **Multivariate Adaptive Regression Splines** node under the Applications tab.

**Multivariate Adaptive Regression Splines Node Requirements**

One or more input variables are required for the Multivariate Adaptive Regression Splines node. The data set can contain at most one target variable, either interval or categorical.

If the input data set contains a frequency variable, the frequency variable must be an interval variable and all observations must be positive integers.

** **

**Multivariate Adaptive Regression Splines Node Properties**

Drag a Multivariate Adaptive Regression Splines node onto an open diagram, and you will see the property panel as shown in Figure 2.

Figure 2: Multivariate Adaptive Regression Splines node properties panel

Here are the descriptions of main properties.

**Main Effects Only**– Specifies whether to include main effects only. If No is selected, then two-way or higher order interaction between spline basis functions are included.

**Interaction Orders**– Specifies higher order interaction when**Main Effects Only**is set to “No”.**Keep Effects**– Specifies a list of variables to be included in the final model.**Effects Without Transformation**– Specifies a list of variables to be considered without nonparametric transformation. Variables should appear in the linear form if they are selected.**Exclude Missing**– Specifies whether to exclude missing from train data.**Spline Options****Maximum Number of Basis**– Uses default the maximum number of basis functions in the final model or specifies in the Maximum Basis Number property. Default is the larger value between 21 and one plus two times the number of non-intercept effects specified in the MODEL statement.**Maximum Basis Number**– Specifies the number of maximum number of basis functions that can be used in the final model when**Maximum Number of Basis**is set to “User Specify”.**Degree of Freedom**– Specifies the degree of freedom. Larger value of degree of freedom lead to fewer spline knots and thus smoother function estimates.**Alpha**– Specifies the number of knots considered for each variable. The value must be from 0 to 1.

**Penalty**– Specifies the penalty for increasing number of variables in the multivariate adaptive regression spline model.**Probability Distribution**– Specifies the probability distribution of Generalized Linear Model. Normal is for interval target by default, Binary for classification if character variable.**Default**: the Normal distribution for continuous response variables and to the Binary distribution for classification or character variables**Poisson****Negative Binomial****Gamma****Binary****Normal**

**Link Function**– Specifies the probability distribution of Generalized Linear Model. Normal is for interval target by default, Binary for classification if character variable.**Default**: corresponding to the probability distribution**Log****Reciprocal****Identity****Logit****Probit****Power with exponent -2****Complementary log-log**

**Selection Method**– Specifies the method of selection process. The default algorithm of Multivariate Adaptive Regression Splines contains two stages: forward selection and backward selection. During the forward selection process, bases are created from interactions between existing parent bases and nonparametric transformation of continuous or classification variables as candidate effects. After the model grows to a certain size, the backward selection process begins by deleting selected based. The deletion continues until the null model is reached, and then the overall best model is chosen based on some goodness-of-fit criterion. The Forward Only selection skips the backward selection step after forward selection is finished.**Use Fast Algorithm**– The fast algorithm improves the speed of the forward selection by tuning several parameters.**Cross Validation**– Specifies whether to perform cross validation.**Number of Folds**– Specifies the number of cross validation fold when**Cross Validation**is set to “Yes”.**Random Seed**– Specifies the seed to start the pseudorandom number generator for random cross validation when**Cross Validation**is set to “Yes”. If 0 is specified, the seed is generated from the time of day, which is read from the computer's clock.**Output Design Matrix**– Specifies whether to create a data set that contains the design matrix of constructed basis functions.**Selected Model**– Specifies the selected model to produce the design matrix when Output Design Matrix is set to “Yes”.**After Backward Selection****After Forward Selection****Initial Model**

**Exclude Rejected Variable**– Excluded Rejected Variable" description="Specifies what action should be taken for variables excluded from the final model. This option is only in effect when using a variable selection method. When set to “None”, the roles of these variables remain unchanged. When set to Hide, these variables are dropped from the metadata exported by the node. When set to “Reject”, the roles of these variables are set to REJECTED.

**Multivariate Adaptive Regression Splines Node Example**

** **

This example uses the sample SAS data set SAMPSIO.HMEQ. You must use the data set to create a SAS Enterprise Miner Data Source. Right-click the **Data Sources** folder in the Project Navigator and select **Create Data Source** to launch the Data Source wizard.

- Select
**SAS Table**as your metadata source and click**Next**. - Enter SAMPSIO.HMEQ in the Table field and click
**Next**. - Continue to the Metadata Advisor step and select the
**Basic Metadata Advisor**. - In the Column Metadata window, set the role of the variable Value to
**Target**and set the level of the variable Value to**Interval**. Click**Next**. - There is no decision processing. Click
**Next**. - In the Create Sample window, you are asked if you want to create a sample data set. Select
**No**. Click**Next**. - Set the role of the HMEQ data set to
**Train**, and then click**Finish**.

Drag the HMEQ data set and the Multivariate Adaptive Regression Splines node to your diagram workspace. Connect them as shown in the diagram below.

Select the button next to the **Keep Effects** property to open a term editor. Specify variable **Job** to be included in the final model as shown in the diagram below, and then click **OK**.

Run the Multivariate Adaptive Regression Splines node with other settings as default by right-clicking on the Multivariate Adaptive Regression Splines node and selecting **Run**. In the Confirmation window, select **Yes**. After a successful run of the Multivariate Adaptive Regression Splines node, select **Results** in the Run Status window.

Notice the following information:

**Bases Transformation Information** is a table of the transformations that are used to generate the basis matrix. The first basis function, Basis0, is the intercept. The second basis function, Basis1, is 1 when variable Job has level ‘Sales’ and 0 otherwise. The eleventh basis function, Basis11, is Loan - 40800 when loan > 40800 and 0 otherwise, and 40800 here is a knot value. Other basis functions are constructed in a similar manner by using other knot values. The knots are chosen automatically.

**Parameter Estimates **is a table of parameter estimates and the selected variables.

**Backward Selection Iteration** is a plot displays the progression of the backward elimination phase. The GCV criterion provides an estimate of how well the model will perform with new data, so the final model should have good predictive power. The figure below shows that the backward elimination step eliminates basis functions 13, 10, and 11.

**ANOVA** is an Analysis of Variance (ANOVA) table for the target variable.

**Classification Variables** is a table of classification variable levels information.

**Fit Control Parameters** is a table of parameters of spline fitting controls.

**Fit Statistics** is a table of the fit statistics from the model.

**Model Information** is a table of Multivariate Adaptive Regression Splines model settings.

**Variable Importance** is a table of input variables, scaled by their relative importance as predictors for the target variable.

**Dependent Variable vs. Fitted Values** is a plot displays the raw dependent variable overlaid with the fitted values. This plot is not produced for dependent variable with nonnormal distribution.

**Residuals vs. Fitted Values** is a plot displays the residuals overlaid with the fitted values. This plot is not produced for dependent variable with nonnormal distribution.

**Note: Special thanks to Paal Navestad, Senior Data Scientist @ ConocoPhillips for providing valuable feedbacks on this article.

Comments

07-13-2017
11:33 PM

- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content

07-13-2017
11:33 PM

Hi,

I am trying to add this extention to my EM; however, I have some difficulties with it as I don't know where I should save the files. the steps provided in the “SAS® Enterprise Miner™ 14.1 Extension Nodes: Developer’s Guide”are not clear for me. I would appreciate it if someone have some clear steps that I can follow.

Thanks for your time in advance.

**Don't miss out on SAS Innovate - Register now for the FREE Livestream!**

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

Data Literacy is for **all**, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.

Article Labels

Article Tags

- Find more articles tagged with:
- EM_Modeling_MARS