Hi, I'm running a PLS model with a "by" variable which has 2 levels (2014 vs. 2017). I've run CV to select the correct number of latent factors, but for each of my "by" variables I need to use a different number of latent factors (3 vs. 5). How do I tell it that if by=2014 use 3 factors and if by=2017 then use 5? Here's the code for the cross-validation and the model with the factors. Thanks.
/* Running Model By Year to separate out 2014 from 2017 */
/* Global Model with all bands at R1 */
proc pls data=splitplsr2 cv=random(seed=12345) cvtest varss plots=(diagnostics dmod scores ParmProfiles VIP XLoadingProfiles);
by yr;
model meas =
R1Avg_412_d.......etc.
R1Avg_917_d / solution;
output OUT=test1;
ods output VariableImportancePlot=vip;
title 'Global PLS Full Model';
run;
/* Model with Factors chosen */
proc pls data=splitplsr2 nfac=3 varss plots=(diagnostics dmod scores ParmProfiles VIP XLoadingProfiles);
by year;
model meas =
R1Avg_412_d....etc.
R1Avg_917_d / solution;
output OUT=outfile predicted=predR1f press=pressR1f yresidual=yresidR1f ;
ods output VariableImportancePlot=vip;
title 'Global PLS Full Final Model';
run;
Through some persistence I figured out to use a "Where" line within the Proc allows splitting. Here's the details of the code. And below this I have another Proc PLS for
where yr='2017';
/* Running Model By Year to separate out 2014 from 2017 */
/* Global Model with all bands at R1 */
proc pls data=splitplsr2 cv=random(seed=12345) cvtest varss plots=(diagnostics dmod scores ParmProfiles VIP XLoadingProfiles);
by yr;
model meas =
R1Avg_412_d.......etc.
R1Avg_917_d / solution;
output OUT=test1;
ods output VariableImportancePlot=vip;
title 'Global PLS Full Model';
run;
/* Model with Factors chosen */
proc pls data=splitplsr2 nfac=3 varss plots=(diagnostics dmod scores ParmProfiles VIP XLoadingProfiles);
by year;
where yr='2014';
model meas =
R1Avg_412_d....etc.
R1Avg_917_d / solution;
output OUT=outfile predicted=predR1f press=pressR1f yresidual=yresidR1f ;
ods output VariableImportancePlot=vip;
title 'Global PLS Full Final Model';
run;
If you want NFAC=3 for one year and NFAC=5 for another year, then you can't fit the models with a BY statement. You need to run PROC PLS twice, once for one year and again for the other year.
Through some persistence I figured out to use a "Where" line within the Proc allows splitting. Here's the details of the code. And below this I have another Proc PLS for
where yr='2017';
/* Running Model By Year to separate out 2014 from 2017 */
/* Global Model with all bands at R1 */
proc pls data=splitplsr2 cv=random(seed=12345) cvtest varss plots=(diagnostics dmod scores ParmProfiles VIP XLoadingProfiles);
by yr;
model meas =
R1Avg_412_d.......etc.
R1Avg_917_d / solution;
output OUT=test1;
ods output VariableImportancePlot=vip;
title 'Global PLS Full Model';
run;
/* Model with Factors chosen */
proc pls data=splitplsr2 nfac=3 varss plots=(diagnostics dmod scores ParmProfiles VIP XLoadingProfiles);
by year;
where yr='2014';
model meas =
R1Avg_412_d....etc.
R1Avg_917_d / solution;
output OUT=outfile predicted=predR1f press=pressR1f yresidual=yresidR1f ;
ods output VariableImportancePlot=vip;
title 'Global PLS Full Final Model';
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.