Hello, experts, the official PLS procedure manual shows that the prediction of new data is performed by combining training data and new data (new data dont have value of responses). I did experiment, say 200 data items for model training, and 2000 new data items for prediction. The results are quite strange that the 2000 predicted results are quite similar (variability is less than 0.3), which is not right for sure.
I would like to have some suggestions about why and how to use PLS procedure properly to get results for this situation, aka size of training data is much smaller than size of new data to be predicted?
Thanks in advance.
Such things are dependent on the actual data used.
Please provide in more detail why you think the results are suspect.
In most models if the input data is "close" for many different types of definitions of "close" then the modeled results should be similar. So the variability might be low.
The options chosen also have some bearing. For any real response we would need 1) The PLS code you used, 2) the input data and 3) the technique used to summarize your results that have a "variability less than 0.3". Note that UNITS of measure can have an effect. Same distance in light years vs miles would result in lower "variability" just because the number of units for each measure are different.
Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the </> icon or attached as text to show exactly what you have and that we can test code against.
Please post example code in a code box. The forum main message windows will reformat code and sometimes in a way that data step code will not run.
@Jonison wrote:
Hello, experts, the official PLS procedure manual shows that the prediction of new data is performed by combining training data and new data (new data dont have value of responses). I did experiment, say 200 data items for model training, and 2000 new data items for prediction. The results are quite strange that the 2000 predicted results are quite similar (variability is less than 0.3), which is not right for sure.
When someone claims they are right and SAS is wrong, I believe SAS. So, you really really really need to demonstrate facts that support your claim. In other words, the burden is on you ... the burden is not on us to defend/explain the answers from SAS.
Also, in your earlier thread, I showed how to get predicted values from PROC SCORE, they match the predicted values from PROC PLS to within round-off error.
Start by showing us the code used to create the input data set (the data set that has the training data and the new observations with missing response values), the code used in PROC PLS, and (a portion of) the data used (using the instructions given by @ballardw ).
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.