SAS Mixed models for predictive analytics on big data (Using SAS Viya VDMML Pipeline )

2 Likes

Using Mixed models such as Proc Mixed for predictive analytics alongside predictive machine learning techniques is becoming more common. Its ability to integrate parameters across covariance matrices, including complex hierarchies of space, time, and factor levels has been a statistical mainstay in many different fields. Recently, as predictive analytics and machine learning have faced an identity crisis for interpretability, people are seeking to meld the practices of partitioning, accuracy assessment, and cross-validation from machine learning with parameterized statistical models. This is particularly true for a situation where one needs to identify factors that are responsible for certain outcomes (e.g., higher yield, disease, increased revenue).

Ways to use mixed models for predictive analytics

One way is to output scored data from Proc Mixed using the PLM procedure. However, PROC MIXED does not use SAS Cloud Analytics Services (CAS) and cannot leverage the power of Massively Parallel Processing (MPP). How do we incorporate a Mixed model in ML pipelines to predict unknown values?

Here comes PROC LMIXED. PROC LMIXED is the mixed model procedure from SAS that uses CAS and can be used in scalable frameworks designed for predictive analytics. With minor code changes from Proc Mixed, it can also be incorporated into a SAS Viya VDMML pipeline, allowing us to compare predictions from mixed or random effects models with other advanced ML models. In this article we will demonstrate how to incorporate PROC LMIXED into VDMML. You can learn more about Proc LMixed here.

How it works

In this hypothetical example, the independent variable is the Canopy area of soybean crop. The dependent variables are genotype, site, replicates, and days after planting, of which we consider all factors random except genotype, site, and their interactions. Now we will use following steps to incorporate mixed model in our ML pipeline.

Step 1: Open Model Studio. Model Studio is a web-based platform in SAS Viya®.
Step 2: Create a model studio project.
Step 3: Import and manage your data in the model studio project.
Step 4: Add a SAS Code child node

Step 5: Open Code Editor and add PROC LMIXED code in the training Code Sections.

/* SAS code */
Proc lmixed data=&dm_data dmmethod=sparse;
class Genotype Site Rep Days_after_planting;
model %dm_dec_target= genotype|site / s; 
random Rep / subject=rep(Site) type=chol;
random int / subject=Days_after_planting type=cs;
savestate rstore=&dm_data_rstore;
run;

Notice the savestate statement. This saves the fitted model and promote it to next node for future scoring. Macrovariables (preceded by “&”) are macro variables similar to those found in SAS Enterprise Miner®.

Step 6: Rename the SAS CODE node to PROC LMIXED or any other meaningful name (Optional)
Step 7: Select training data only for the PROC LMIXED node
Step 8: Add a child node to PROC Lmixed node and move to supervised learning.
Step 9 Open code editor and add following codes
Training Code:

%dmcas_metachange(name=%dm_predicted_var, role=predict, level=interval)

Scoring Code:

length %dm_predicted_var 8; /* change based on your data */ %dm_predicted_var=PRED;

You can rename this code node to indicate the purpose of your original model. The name of this code node will be visible in the model comparison interface

Step 10 (optional): Add another supervised node for model comparison. I used a forest model. This is how the final pipeline looks.

Step 11: Run the pipeline and explore results.

That’s it. We have successfully incorporated PROC LMIXED to the VDMML pipeline.

The outcomes are:

Model comparison of the lmixed model with any other pipeline node
Scored data available for visualization or further modeling. To obtain this, right click on the purple node of interest and load the node output. Predictors have the prefix “P_”.

For additional SAS Code features and functionality, check out this GitHub repository. And for more information about the breadth of features in Model Studio, check out this SAS Global Forum paper. I hope you’ll find prediction with parameters as insightful and useful as we have.

To know more about how to use and scale mixed models in SAS, watch this tutorial by John Gottula

Acknowledgment: John Gottula (john.gottula@sas.com), Christian Medins

jozgot · ‎06-28-2021

Wow! This is a great way for breeders to use sparse matrix functionality to derive predictors from big data field trial data sets.

SAS Mixed models for predictive analytics on big data (Using SAS Viya VDMML Pipeline )

Free course: Data Literacy Essentials

Get Started