Solved: Re: Output Rsquare and RMSE for Proc PLS model

Daisy2 · Posted 09-11-2020 07:23 PM

Hi. I'm new to Proc PLS and noticed that when people talk of the results for their Proc PLS models they report Rsquare, RMSE and Standard error of the prediction (SEP). With my code shown below I was able to get a plot of the Rsq for the model, but

1. how do find that number for my final selected set of factors?

2. how do I get it to show up on the measured-to-predicted plot?

3. how do i get the RMSE value outputted?

4. how do I calculate the SEP? I was able to output my predicted values, but got lost there.

5. how do I SAS to split my data into train (2/3 of data) and test (1/3)?

proc pls data=WORK.PhRSData cv=split cvtest varss plots=(diagnostics dmod xyscores ParmProfiles VIP XLoadingProfiles);
	class loc hyb pd ntrt rep;
	model y = loc hyb pd ntrt rep(loc) var1 var2 var3 var4...var25 / solution;
	output OUT=outfile predicted=ypred press=press;
	ods output VariableImportancePlot=vip;
	title 'Global PLS NCE Model w/ All vars';
run;
/* Reduced model with diagnostics -setting factors*/
proc pls data=WORK.PhRSData nfac=8 cv=split cvtest varss plots=(diagnostics dmod xyscores ParmProfiles VIP XLoadingProfiles);
	class loc hyb pd ntrt rep;
	model y = loc hyb pd ntrt rep(loc) var1 var2 var3 var4...var10 / solution;
	output OUT=outfile2 predicted=ypred2 press=press2;
	ods output VariableImportancePlot=vip2;
	title 'Reduced PLS Model with Bands & Vars';
run;
ODS GRAPHICS OFF;

Thanks!

PaigeMiller · Posted 09-12-2020 01:45 PM

PROC PLS and PROC PLM do not work together.

You could split the data, do Cross-Validation on the training data set to determine the number of dimensions, then use the validation data to see how it compares to the training data. That's something you'd have to program yourself, and honestly, I have never done that with PROC PLS. I have done this with a macro I wrote that does Logistic PLS.

Some background: Partial Least Squares was developed way outside mainstream statistical academia, and that's why there is little in the way of hypothesis testing and little in the way of confidence intervals and no SEP or similar. I'm sure there are people who have tried to add this capability Partial Least Squares, but it hasn't made its way into SAS. Nevertheless, the success of Partial Least Squares, with thousands of published papers, handling situations of high multicollinearity of the X variables that are really difficult to handle via other methods is impressive, and somewhat indicates that the statistical hypothesis testing/confidence interval type of analyses are not really necessary. But that makes PLS somewhat uncomfortable for people who are used to hypothesis testing and confidence intervals, and so you need to be aware of this.

--
Paige Miller

View solution in original post

PaigeMiller · Posted 09-11-2020 07:35 PM

1. Usually, the number of factors to choose for the final model is chosen from the Cross-Validation results

2. "how do I get it to show up on the measured-to-predicted plot? " how do you get WHAT to show up?

3. All statistics that are calculated can be saved in SAS data set, the table names for ODS OUTPUT are given here and here. However, I don't think RMSE is computed by PROC PLS; of course you can compute it from the residuals, or from the R-squared for Y.

4. I don't think SEP is computed by PROC PLS.

5. Not a function of PROC PLS. You'd have to do this split in DATA steps before running PROC PLS. But IMHO Cross-Validation is a better method than train/test, when it is available, and it is available in PROC PLS.

--
Paige Miller

Daisy2 · Posted 09-12-2020 01:11 PM

Hi PaigeMiller,
Thanks for your responses.
2. The "it" I was referring to is Rsq. Can Proc PLM be used with Proc PLS for graphing?
3. Thanks. I will check that out as well as see how to calculate RMSE from the output.
5. I saw several places with data on splitting, but I thought I needed to split then data, run PLS on training data with cv, then re-run the reduced model with test data to evaluate the stability of the model with 'unseen' data. Is that wrong?

Regards, Daisy2

PaigeMiller · Posted 09-12-2020 01:45 PM

PROC PLS and PROC PLM do not work together.

You could split the data, do Cross-Validation on the training data set to determine the number of dimensions, then use the validation data to see how it compares to the training data. That's something you'd have to program yourself, and honestly, I have never done that with PROC PLS. I have done this with a macro I wrote that does Logistic PLS.

Some background: Partial Least Squares was developed way outside mainstream statistical academia, and that's why there is little in the way of hypothesis testing and little in the way of confidence intervals and no SEP or similar. I'm sure there are people who have tried to add this capability Partial Least Squares, but it hasn't made its way into SAS. Nevertheless, the success of Partial Least Squares, with thousands of published papers, handling situations of high multicollinearity of the X variables that are really difficult to handle via other methods is impressive, and somewhat indicates that the statistical hypothesis testing/confidence interval type of analyses are not really necessary. But that makes PLS somewhat uncomfortable for people who are used to hypothesis testing and confidence intervals, and so you need to be aware of this.

--
Paige Miller

Output Rsquare and RMSE for Proc PLS model

Re: Output Rsquare and RMSE for Proc PLS model

Re: Output Rsquare and RMSE for Proc PLS model

Re: Output Rsquare and RMSE for Proc PLS model

Re: Output Rsquare and RMSE for Proc PLS model

SAS Innovate 2025: Call for Content