Hello, all, I am developing customized model for company to use, and using multiple Y PLS.
The PLS procedure offers leave-one out validation, and also SAS has Jackknife estimates function. Question is that how can we combine the two functions together to get "Jackknife standard error of the prediction of Y for observations in the predictionset, computed from all rounds of cross validation"?
Also, the output provides Y-residuels as well, is this help for the Jackknife standard error referred above?
Many thanks for your kind reply.
I don't think it is possible to access individual iterations of the cross-validation done by PROC PLS. All you can obtain is the summary statistics once all cross-validation runs are finished.
Calling @Rick_SAS
I am not an expert on PLS, but my understanding is that the leave-one-out cross-validation is used to determine how many factors to extract. After you perform one run of the procedure, you get ONE model and ONE set of parameter estimates. This is different from the usual use of the jackknife in which you obtain N different models.
If you want to get standard errors for the prediction limits, I think you will have to either form the jackknife samples yourself or use the %JACK macro. I've written a short overview of using the %BOOT macro, and the %JACK macro is similar.
If you have SAS/IML, another alternative it to use the basic Jackknife framework in the article "Jackknife estimates in SAS" to obtain the standard errors. The function that evaluates the statistic on each jackknife sample ('EvalStat') can write the sample to a data set and use the SUBMIT/ENDSUBMIT statements to call PROC PLS and retrieve the predicted value.
One complication is that you are predicting multiple response variables, so you will get a "multivariate" standard error for the prediction. One standard error for each response variable at each location that you score. If you haven't implemented the jackknife before, you might want to implement your scheme for a single response variable FIRST, just to practice on an easier problem.
While I agree that perhaps the %JACK macro, or PROC IML could work in similar cases, the original request was to combine each iteration of the cross-validation results into a Jackknife, and as far as I know, you can't get those results of each cross-validation iteration out of PROC PLS.
Either by using macros or PROC IML, you could (in theory) obtain the results of each cross-validation iteration, which could then be jackknifed. This seems a hugely complicated programming task, followed by the hugely complicated and extremely important task of assuring that you get the right answer.
Lastly, I am not even sure that cross-validation and jackknife go together and play nicely with each other or even make sense to combine. I have thought about this a little, and I'm not sure I see the purpose. (I do see the benefit of cross-validation to determine the number of dimensions, and then jackknife to obtain standard errors of predicted values as a valid method; but to obtain each iteration of cross-validation and then jackknife that is where I have logical problems).
Thank you for your kind reply. The number of factors of model has been optimized selected, and the PLS of SAS mainly focuses on model prediction (model execution). The jackknife here is used for estimated the predicted error of each prediction, which will be used for decision making (whether the prediction will be selected). leave-one out will iterated to get (n-1) predicted results, and jackknife will be used to get std error of these (n-1) samples.
or would other statistic indicators could be used as alternative option of predicted error?
@Jonison wrote:
The number of factors of model has been optimized selected,
Then you don't need cross-validation any more. You just need the jackknife.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.