turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Issue scoring data using PROC PLS outputs: the sco...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-04-2016 08:14 PM - edited 03-05-2016 08:44 AM

Hi,

I am using PROC PLS to identify orthogonal factors from a set of predictors that has a lot of redundancy (and very high multicollinearity.) PLS works great. The factors and the regression models with those factors are fine.

I also have additional observations that were not used when generating the PLS model (e.g. a validation set, new data, etc.) These observations need to be "scored" using the model in order to generate both predictions and the factors for them. Unfortunately, PLS does not produce a "scoring output dataset" but I have a macro that will generate one using the information in these two ODS tables: XVariableCenScale and XWeights.

When I use that "scoring output dataset" with PROC SCORE, I can sucessfully reproduce the first factor but not the rest of the factors. There are some differences in the second factor, bigger differences in the third factor, etc. The scored factors seem become worse and worse.

Perhaps the assumptions of this macro are not true. I believe that I am using the correct ODS tables because the first factor is correct. Buy why not the other ones? I know PLS builds the factors in an iterative fashion. Maybe there is no way to derive those factors with a simple matrix multiplication (which is what proc score does.) Maybe the scoring has to be implemented in an iterative

fashion too (e.g. start with scaling/centering the data, then get the first factor... then operate on the remaining data, then get the second factor, etc.)

Does anybody has pointers to get this to work right for all factors? or does anybody know an existing procedure that I can use? (I don't want to reinvent the wheel if something is already out there...)

I am attaching the code of the macro and an example with simulated data for anybody that wants to reproduce this issue.

I would appreciate any contributions!

Thank you,

Carlos

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to carlosmirandad

03-05-2016 08:20 AM - edited 03-05-2016 08:46 AM

Found the following comment in the SAS user guide: " You can save the model fit by the PLS procedure in a data set and apply it to new data by using the SCORE procedure."

(source: http://support.sas.com/documentation/cdl//en/statug/68162/HTML/default/statug_pls_overview01.htm)

I think this refers to scoring new data to generate dependent variable predictions (e.g. the solution.) What I am looking for is not just that, but also how to generate the factors themselves for the new data (using proc score or something else.) If anybody knows an option to do that, please advice.

Thank you,

Carlos

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to carlosmirandad

03-05-2016 12:09 PM

Wouldn't you run it through PROC Factor? or Proc PRINCOMP?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Reeza

03-05-2016 03:35 PM

Thanks Reeza for the question. Factor Analysis and Principal Components do not take into account the target variable which is key for the problem I am solving. In the past I have used principal components to get orthogonal predictors and then used regression with variable selection to find those components that track with the target variable. I was hoping to use PLS now (since this method does both things: the rotation and the identification of factors that predict well the target.