BookmarkSubscribeRSS Feed

How to Use Generalized Additive Models in SAS Viya

Started ‎09-20-2023 by
Modified ‎09-20-2023 by
Views 542

The procedure PROC GAMMOD and the Generalized Additive Model (GAM) node in SAS Model Studio build generalized additive models. GAMs are able to fit non-normal, non-linear models.  This makes them quite useful!  This post will show you how easy they are to use in SAS Model Studio.

 

Example Use Case:

 

  • Number of births per areabe_1_image001-300x291.png
  • Predict number of births to determine need for hospitals, pediatricians, teachers, etc.
  • Inputs can include population totals, proportion female, proportion child-bearing age, income levels, education levels, access to birth control, etc.

GAMs versus GLMs versus Linear Regression

 

Recall the assumptions of ordinary least squares simple linear regression models:

  • Linearity
  • Normality
  • IID
  • Homoscedasticity

Linearity: The relationship between the outcome and the features is linear in the parameters

 

be_1_image003.png

From Brian Gaines https://blogs.sas.com/content/subconsciousmusings/2022/03/24/accuracy-versus-interpretability-with-g...

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

 

Normality: Errors follow a normal distribution

 

be_2_image005.png

IID errors: Error terms are independent and identically distributed.

Homoscedasticity: Equal variance. 

 

be_3_image007.png

 

In most cases we do not have normality. We may have other distributions such as Poisson, negative binomial, Gamma, Tweedie, etc. But not to worry. In these cases we can use generalized linear models (GLMs) or generalized additive models (GAMs). Generalized Linear Models (GLMs) assume linearity, but generalized additive models (GAMs) do not.  GAMs accomplish this using spline terms to generalize linear models’ assumption of linearity.  Thus GAMs are even more generalized than GLMs!

 

Another perk of GAMs is that they have less bias than GLMs. However, as the number of observations increases, GAMs tend to converge more slowly than GLMs.

 

PROC GAMMOD versus PROC GAMPL

 

PROC GAMMOD (SAS Viya) is very similar to PROC GAMPL (SAS/STAT SAS 9 high performance) in features, options and results. PROC GAMMOD does provide additional functionality by supporting:

  • BY processing
  • Creation of an analytic store for scoring

The link function also differs between the procedures. PROC GAMMOD uses a log link for both the Gamma and Inverse Gaussian distributions. PROC GAMPL uses the reciprocal for a Gamma distribution and the reciprocal square for Inverse Gaussian distributions, as shown in the table below.

 

be_4_image009.png

 

PROC GAMMOD versus PROC GAM

 

PROC GAM (SAS/STAT SAS 9) supports Gaussian (normal), binomial, Poisson, Gamma and inverse Gaussian distributions. PROC GAMMOD (SAS Viya) supports all of these PLUS negative binomial and Tweedie distributions. In addition, PROC GAMMOD and PROC GAM work very differently. Do not expect results to be similar. See some of the differences detailed in the table below.

 

be_5_image011.png

Source: https://go.documentation.sas.com/doc/en/pgmsascdc/default/casstat/casstat_gammod_overview02.htm

 

The Generalized Additive Model node became available in SAS Model Studio in 2020.1.4. GAM is now included in the Advanced Template for an Interval Target in SAS Model Studio.

 

be_6_image013-1024x522.png

 

Defaults for the GAM node are:

  • Selection Method: Boosting
  • Interval target probability distribution: Normal
  • Interval target link function: Identity
  • Binary target link function: Logit
  • Degrees of freedom: 4
  • Class input order: Formatted
  • Class input coding: GLM

as shown in the screen capture below.

 

be_7_image015.png

 

Other distributions available are Gamma, inverse Gaussian, negative binomial, Poisson and Tweedie.

 

be_8_image017-300x203.png

 

The default boosting options are:

  • Maximum number of iterations: 500
  • Learning rate: 0.1

    be_9_image019.png

 

By default, smoothing plots are displayed as individual plots (up to 10 plots) and grouped as one report.

 

be_10_image021.png

 

If you automatically generate a pipeline in SAS Model Studio, if will consider a GAM model.

 

be_11_image023.png

 

You can, in fact, force it to include a GAM model in the pipeline, if you so desire.

 

be_12_image025.png

 

Here is the resulting pipeline for my data (HEART data) when I forced it to include GAM.

 

be_13_image027.png

 

With my data the GAM model did better than Forest or Linear Regression. However, Gradient Boosting and the Ensemble model did better than GAM.

 

be_13_image029.png

 

Summary

 

Generalized additive models are useful when you have neither linearity nor normality.  SAS Model Studio makes it very easy to add a GAM model using the GAM node. SAS Model Studio now includes GAM in its Advanced Template with an Interval Target.  SAS Model Studio also considers GAM in its automatically generated pipeline if you have an interval target.

 

Because it is so easy to add a GAM model in SAS Model Studio, you can always include a GAM to see if it outperforms your other models.

 

For More Information

 

Find more articles from SAS Global Enablement and Learning here.

Version history
Last update:
‎09-20-2023 11:55 AM
Updated by:
Contributors

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags