Hi. I'm new to Proc PLS and noticed that when people talk of the results for their Proc PLS models they report Rsquare, RMSE and Standard error of the prediction (SEP). With my code shown below I was able to get a plot of the Rsq for the model, but
1. how do find that number for my final selected set of factors?
2. how do I get it to show up on the measured-to-predicted plot?
3. how do i get the RMSE value outputted?
4. how do I calculate the SEP? I was able to output my predicted values, but got lost there.
5. how do I SAS to split my data into train (2/3 of data) and test (1/3)?
proc pls data=WORK.PhRSData cv=split cvtest varss plots=(diagnostics dmod xyscores ParmProfiles VIP XLoadingProfiles);
class loc hyb pd ntrt rep;
model y = loc hyb pd ntrt rep(loc) var1 var2 var3 var4...var25 / solution;
output OUT=outfile predicted=ypred press=press;
ods output VariableImportancePlot=vip;
title 'Global PLS NCE Model w/ All vars';
run;
/* Reduced model with diagnostics -setting factors*/
proc pls data=WORK.PhRSData nfac=8 cv=split cvtest varss plots=(diagnostics dmod xyscores ParmProfiles VIP XLoadingProfiles);
class loc hyb pd ntrt rep;
model y = loc hyb pd ntrt rep(loc) var1 var2 var3 var4...var10 / solution;
output OUT=outfile2 predicted=ypred2 press=press2;
ods output VariableImportancePlot=vip2;
title 'Reduced PLS Model with Bands & Vars';
run;
ODS GRAPHICS OFF;
Thanks!
PROC PLS and PROC PLM do not work together.
You could split the data, do Cross-Validation on the training data set to determine the number of dimensions, then use the validation data to see how it compares to the training data. That's something you'd have to program yourself, and honestly, I have never done that with PROC PLS. I have done this with a macro I wrote that does Logistic PLS.
Some background: Partial Least Squares was developed way outside mainstream statistical academia, and that's why there is little in the way of hypothesis testing and little in the way of confidence intervals and no SEP or similar. I'm sure there are people who have tried to add this capability Partial Least Squares, but it hasn't made its way into SAS. Nevertheless, the success of Partial Least Squares, with thousands of published papers, handling situations of high multicollinearity of the X variables that are really difficult to handle via other methods is impressive, and somewhat indicates that the statistical hypothesis testing/confidence interval type of analyses are not really necessary. But that makes PLS somewhat uncomfortable for people who are used to hypothesis testing and confidence intervals, and so you need to be aware of this.
1. Usually, the number of factors to choose for the final model is chosen from the Cross-Validation results
2. "how do I get it to show up on the measured-to-predicted plot? " how do you get WHAT to show up?
3. All statistics that are calculated can be saved in SAS data set, the table names for ODS OUTPUT are given here and here. However, I don't think RMSE is computed by PROC PLS; of course you can compute it from the residuals, or from the R-squared for Y.
4. I don't think SEP is computed by PROC PLS.
5. Not a function of PROC PLS. You'd have to do this split in DATA steps before running PROC PLS. But IMHO Cross-Validation is a better method than train/test, when it is available, and it is available in PROC PLS.
PROC PLS and PROC PLM do not work together.
You could split the data, do Cross-Validation on the training data set to determine the number of dimensions, then use the validation data to see how it compares to the training data. That's something you'd have to program yourself, and honestly, I have never done that with PROC PLS. I have done this with a macro I wrote that does Logistic PLS.
Some background: Partial Least Squares was developed way outside mainstream statistical academia, and that's why there is little in the way of hypothesis testing and little in the way of confidence intervals and no SEP or similar. I'm sure there are people who have tried to add this capability Partial Least Squares, but it hasn't made its way into SAS. Nevertheless, the success of Partial Least Squares, with thousands of published papers, handling situations of high multicollinearity of the X variables that are really difficult to handle via other methods is impressive, and somewhat indicates that the statistical hypothesis testing/confidence interval type of analyses are not really necessary. But that makes PLS somewhat uncomfortable for people who are used to hypothesis testing and confidence intervals, and so you need to be aware of this.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.