BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Daisy2
Obsidian | Level 7

Hi.  I'm new to Proc PLS and noticed that when people talk of the results for their Proc PLS models they report Rsquare, RMSE and Standard error of the prediction (SEP).  With my code shown below I was able to get a plot of the Rsq for the model, but

1.  how do find that number for my final selected set of factors?

2.  how do I get it to show up on the measured-to-predicted plot?  

3. how do i get the RMSE value outputted?

4. how do I calculate the SEP?  I was able to output my predicted values, but got lost there.  

5.  how do I SAS to split my data into train (2/3 of data) and test (1/3)?

proc pls data=WORK.PhRSData cv=split cvtest varss plots=(diagnostics dmod xyscores ParmProfiles VIP XLoadingProfiles);
	class loc hyb pd ntrt rep;
	model y = loc hyb pd ntrt rep(loc) var1 var2 var3 var4...var25 / solution;
	output OUT=outfile predicted=ypred press=press;
	ods output VariableImportancePlot=vip;
	title 'Global PLS NCE Model w/ All vars';
run;
/* Reduced model with diagnostics -setting factors*/
proc pls data=WORK.PhRSData nfac=8 cv=split cvtest varss plots=(diagnostics dmod xyscores ParmProfiles VIP XLoadingProfiles);
	class loc hyb pd ntrt rep;
	model y = loc hyb pd ntrt rep(loc) var1 var2 var3 var4...var10 / solution;
	output OUT=outfile2 predicted=ypred2 press=press2;
	ods output VariableImportancePlot=vip2;
	title 'Reduced PLS Model with Bands & Vars';
run;
ODS GRAPHICS OFF;

 

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

PROC PLS and PROC PLM do not work together.

 

You could split the data, do Cross-Validation on the training data set to determine the number of dimensions, then use the validation data to see how it compares to the training data. That's something you'd have to program yourself, and honestly, I have never done that with PROC PLS. I have done this with a macro I wrote that does Logistic PLS.

 

Some background: Partial Least Squares was developed way outside mainstream statistical academia, and that's why there is little in the way of hypothesis testing and little in the way of confidence intervals and no SEP or similar. I'm sure there are people who have tried to add this capability Partial Least Squares, but it hasn't made its way into SAS. Nevertheless, the success of Partial Least Squares, with thousands of published papers, handling situations of high multicollinearity of the X variables that are really difficult to handle via other methods is impressive, and somewhat indicates that the statistical hypothesis testing/confidence interval type of analyses are not really necessary. But that makes PLS somewhat uncomfortable for people who are used to hypothesis testing and confidence intervals, and so you need to be aware of this.

--
Paige Miller

View solution in original post

3 REPLIES 3
PaigeMiller
Diamond | Level 26

1. Usually, the number of factors to choose for the final model is chosen from the Cross-Validation results

2. "how do I get it to show up on the measured-to-predicted plot? " how do you get WHAT to show up?

3. All statistics that are calculated can be saved in SAS data set, the table names for ODS OUTPUT are given here and here. However, I don't think RMSE is computed by PROC PLS; of course you can compute it from the residuals, or from the R-squared for Y.

4. I don't think SEP is computed by PROC PLS.

5. Not a function of PROC PLS. You'd have to do this split in DATA steps before running PROC PLS. But IMHO Cross-Validation is a better method than train/test, when it is available, and it is available in PROC PLS.

--
Paige Miller
Daisy2
Obsidian | Level 7
Hi PaigeMiller,
Thanks for your responses.
2. The "it" I was referring to is Rsq. Can Proc PLM be used with Proc PLS for graphing?
3. Thanks. I will check that out as well as see how to calculate RMSE from the output.
5. I saw several places with data on splitting, but I thought I needed to split then data, run PLS on training data with cv, then re-run the reduced model with test data to evaluate the stability of the model with 'unseen' data. Is that wrong?

Regards, Daisy2

PaigeMiller
Diamond | Level 26

PROC PLS and PROC PLM do not work together.

 

You could split the data, do Cross-Validation on the training data set to determine the number of dimensions, then use the validation data to see how it compares to the training data. That's something you'd have to program yourself, and honestly, I have never done that with PROC PLS. I have done this with a macro I wrote that does Logistic PLS.

 

Some background: Partial Least Squares was developed way outside mainstream statistical academia, and that's why there is little in the way of hypothesis testing and little in the way of confidence intervals and no SEP or similar. I'm sure there are people who have tried to add this capability Partial Least Squares, but it hasn't made its way into SAS. Nevertheless, the success of Partial Least Squares, with thousands of published papers, handling situations of high multicollinearity of the X variables that are really difficult to handle via other methods is impressive, and somewhat indicates that the statistical hypothesis testing/confidence interval type of analyses are not really necessary. But that makes PLS somewhat uncomfortable for people who are used to hypothesis testing and confidence intervals, and so you need to be aware of this.

--
Paige Miller

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 805 views
  • 1 like
  • 2 in conversation