Using Mixed models such as Proc Mixed for predictive analytics alongside predictive machine learning techniques is becoming more common. Its ability to integrate parameters across covariance matrices, including complex hierarchies of space, time, and factor levels has been a statistical mainstay in many different fields. Recently, as predictive analytics and machine learning have faced an identity crisis for interpretability, people are seeking to meld the practices of partitioning, accuracy assessment, and cross-validation from machine learning with parameterized statistical models. This is particularly true for a situation where one needs to identify factors that are responsible for certain outcomes (e.g., higher yield, disease, increased revenue).
Ways to use mixed models for predictive analytics
One way is to output scored data from Proc Mixed using the PLM procedure. However, PROC MIXED does not use SAS Cloud Analytics Services (CAS) and cannot leverage the power of Massively Parallel Processing (MPP). How do we incorporate a Mixed model in ML pipelines to predict unknown values?
Here comes PROC LMIXED. PROC LMIXED is the mixed model procedure from SAS that uses CAS and can be used in scalable frameworks designed for predictive analytics. With minor code changes from Proc Mixed, it can also be incorporated into a SAS Viya VDMML pipeline, allowing us to compare predictions from mixed or random effects models with other advanced ML models. In this article we will demonstrate how to incorporate PROC LMIXED into VDMML. You can learn more about Proc LMixed here.
How it works
In this hypothetical example, the independent variable is the Canopy area of soybean crop. The dependent variables are genotype, site, replicates, and days after planting, of which we consider all factors random except genotype, site, and their interactions. Now we will use following steps to incorporate mixed model in our ML pipeline.
/* SAS code */
Proc lmixed data=&dm_data dmmethod=sparse;
class Genotype Site Rep Days_after_planting;
model %dm_dec_target= genotype|site / s;
random Rep / subject=rep(Site) type=chol;
random int / subject=Days_after_planting type=cs;
savestate rstore=&dm_data_rstore;
run;
Notice the savestate statement. This saves the fitted model and promote it to next node for future scoring. Macrovariables (preceded by “&”) are macro variables similar to those found in SAS Enterprise Miner®.
%dmcas_metachange(name=%dm_predicted_var, role=predict, level=interval)
Scoring Code:
length %dm_predicted_var 8; /* change based on your data */
%dm_predicted_var=PRED;
You can rename this code node to indicate the purpose of your original model. The name of this code node will be visible in the model comparison interface
That’s it. We have successfully incorporated PROC LMIXED to the VDMML pipeline.
The outcomes are:
For additional SAS Code features and functionality, check out this GitHub repository. And for more information about the breadth of features in Model Studio, check out this SAS Global Forum paper. I hope you’ll find prediction with parameters as insightful and useful as we have.
To know more about how to use and scale mixed models in SAS, watch this tutorial by John Gottula
Acknowledgment: John Gottula (john.gottula@sas.com), Christian Medins
Wow! This is a great way for breeders to use sparse matrix functionality to derive predictors from big data field trial data sets.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.