PLS procedure

Jonison · Posted 08-14-2020 07:43 PM

Hello, experts, the official PLS procedure manual shows that the prediction of new data is performed by combining training data and new data (new data dont have value of responses). I did experiment, say 200 data items for model training, and 2000 new data items for prediction. The results are quite strange that the 2000 predicted results are quite similar (variability is less than 0.3), which is not right for sure.

I would like to have some suggestions about why and how to use PLS procedure properly to get results for this situation, aka size of training data is much smaller than size of new data to be predicted?

Thanks in advance.

ballardw · Posted 08-14-2020 08:07 PM

Such things are dependent on the actual data used.

Please provide in more detail why you think the results are suspect.

In most models if the input data is "close" for many different types of definitions of "close" then the modeled results should be similar. So the variability might be low.

The options chosen also have some bearing. For any real response we would need 1) The PLS code you used, 2) the input data and 3) the technique used to summarize your results that have a "variability less than 0.3". Note that UNITS of measure can have an effect. Same distance in light years vs miles would result in lower "variability" just because the number of units for each measure are different.

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the </> icon or attached as text to show exactly what you have and that we can test code against.

Please post example code in a code box. The forum main message windows will reformat code and sometimes in a way that data step code will not run.

PaigeMiller · Posted 08-15-2020 07:14 AM

@Jonison wrote:

Hello, experts, the official PLS procedure manual shows that the prediction of new data is performed by combining training data and new data (new data dont have value of responses). I did experiment, say 200 data items for model training, and 2000 new data items for prediction. The results are quite strange that the 2000 predicted results are quite similar (variability is less than 0.3), which is not right for sure.

When someone claims they are right and SAS is wrong, I believe SAS. So, you really really really need to demonstrate facts that support your claim. In other words, the burden is on you ... the burden is not on us to defend/explain the answers from SAS.

Also, in your earlier thread, I showed how to get predicted values from PROC SCORE, they match the predicted values from PROC PLS to within round-off error.

Start by showing us the code used to create the input data set (the data set that has the training data and the new observations with missing response values), the code used in PROC PLS, and (a portion of) the data used (using the instructions given by @ballardw ).

--
Paige Miller

PLS procedure

Re: PLS procedure

Re: PLS procedure

PLS procedure

Re: PLS procedure

Re: PLS procedure

The 2025 SAS Hackathon has begun!