BookmarkSubscribeRSS Feed

SAS Mixed models for predictive analytics on big data (Using SAS Viya VDMML Pipeline )

Started ‎06-25-2021 by
Modified ‎07-12-2021 by
Views 4,506

Using Mixed models such as Proc Mixed for predictive analytics alongside predictive machine learning techniques is becoming more common. Its ability to integrate parameters across covariance matrices, including complex hierarchies of space, time, and factor levels has been a statistical mainstay in many different fields. Recently, as predictive analytics and machine learning have faced an identity crisis for interpretability, people are seeking to meld the practices of partitioning, accuracy assessment, and cross-validation from machine learning with parameterized statistical models. This is particularly true for a situation where one needs to identify factors that are responsible for certain outcomes (e.g., higher yield, disease, increased revenue). 

 

Ways to use mixed models for predictive analytics

 

 One way is to output scored data from Proc Mixed using the PLM procedure. However, PROC MIXED does not use SAS Cloud Analytics Services (CAS) and cannot leverage the power of Massively Parallel Processing (MPP). How do we incorporate a Mixed model in ML pipelines to predict unknown values?  

Here comes PROC LMIXED. PROC LMIXED is the mixed model procedure from SAS that uses CAS and can be used in scalable frameworks designed for predictive analytics. With minor code changes from Proc Mixed, it can also be incorporated into a SAS Viya VDMML pipeline, allowing us to compare predictions from mixed or random effects models with other advanced ML models. In this article we will demonstrate how to incorporate PROC LMIXED into VDMML. You can learn more about Proc LMixed here.

 

How it works

In this hypothetical example, the independent variable is the Canopy area of soybean crop. The dependent variables are genotype, site, replicates, and days after planting, of which we consider all factors random except genotype, site, and their interactions. Now we will use following steps to incorporate mixed model in our ML pipeline.

 

             samiulhaque_0-1624625927747.png
  • Step 5:  Open Code Editor and add PROC LMIXED code in the training Code Sections.
    /* SAS code */
    Proc lmixed data=&dm_data dmmethod=sparse;
    class Genotype Site Rep Days_after_planting;
    model %dm_dec_target= genotype|site / s; 
    random Rep / subject=rep(Site) type=chol;
    random int / subject=Days_after_planting type=cs;
    savestate rstore=&dm_data_rstore;
    run;
    

     

Notice the savestate statement. This saves the fitted model and promote it to next node for future scoring. Macrovariables (preceded by “&”) are macro variables similar to those found in SAS Enterprise Miner®.

 

  • Step 6: Rename the SAS CODE node to PROC LMIXED  or any other meaningful name (Optional)
  • Step 7: Select training data only for the PROC LMIXED node
    samiulhaque_3-1624626028549.png

     

  • Step 8: Add a child node to PROC Lmixed node and move to supervised learning.

    samiulhaque_4-1624626091853.png

     

  • Step 9 Open code editor and add following codes
     Training Code:

 %dmcas_metachange(name=%dm_predicted_var, role=predict, level=interval)

        Scoring Code:
   

length %dm_predicted_var 8; /* change based on your data */ %dm_predicted_var=PRED;

  

    You can rename this code node to indicate the purpose of your original model. The name of this code node will be visible in the model comparison interface

  • Step 10 (optional): Add another supervised node for model comparison. I used a forest model. This is how the final pipeline looks.

 

samiulhaque_5-1624626520043.png

 

  • Step 11: Run the pipeline and explore results.

That’s it. We have successfully incorporated PROC LMIXED to the VDMML pipeline.

 

The outcomes are:

 

  1.  Model comparison of the lmixed model with any other pipeline node
  2. Scored data available for visualization or further modeling. To obtain this, right click on the purple node of interest and load the node output. Predictors have the prefix “P_”.

 

For additional SAS Code features and functionality, check out this GitHub repository. And for more information about the breadth of features in Model Studio, check out this SAS Global Forum paper. I hope you’ll find prediction with parameters as insightful and useful as we have.

 

To know more about how to use and scale mixed models in SAS, watch this tutorial by John Gottula

 

Acknowledgment: John Gottula (john.gottula@sas.com), Christian Medins

Comments

Wow! This is a great way for breeders to use sparse matrix functionality to derive predictors from big data field trial data sets.

Version history
Last update:
‎07-12-2021 05:21 PM
Updated by:
Contributors

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags