03-04-2016 08:14 PM - edited 03-05-2016 08:44 AM
I am using PROC PLS to identify orthogonal factors from a set of predictors that has a lot of redundancy (and very high multicollinearity.) PLS works great. The factors and the regression models with those factors are fine.
I also have additional observations that were not used when generating the PLS model (e.g. a validation set, new data, etc.) These observations need to be "scored" using the model in order to generate both predictions and the factors for them. Unfortunately, PLS does not produce a "scoring output dataset" but I have a macro that will generate one using the information in these two ODS tables: XVariableCenScale and XWeights.
When I use that "scoring output dataset" with PROC SCORE, I can sucessfully reproduce the first factor but not the rest of the factors. There are some differences in the second factor, bigger differences in the third factor, etc. The scored factors seem become worse and worse.
Perhaps the assumptions of this macro are not true. I believe that I am using the correct ODS tables because the first factor is correct. Buy why not the other ones? I know PLS builds the factors in an iterative fashion. Maybe there is no way to derive those factors with a simple matrix multiplication (which is what proc score does.) Maybe the scoring has to be implemented in an iterative
fashion too (e.g. start with scaling/centering the data, then get the first factor... then operate on the remaining data, then get the second factor, etc.)
Does anybody has pointers to get this to work right for all factors? or does anybody know an existing procedure that I can use? (I don't want to reinvent the wheel if something is already out there...)
I am attaching the code of the macro and an example with simulated data for anybody that wants to reproduce this issue.
I would appreciate any contributions!
03-05-2016 08:20 AM - edited 03-05-2016 08:46 AM
Found the following comment in the SAS user guide: " You can save the model fit by the PLS procedure in a data set and apply it to new data by using the SCORE procedure."
I think this refers to scoring new data to generate dependent variable predictions (e.g. the solution.) What I am looking for is not just that, but also how to generate the factors themselves for the new data (using proc score or something else.) If anybody knows an option to do that, please advice.
03-05-2016 03:35 PM