BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
xcong
Fluorite | Level 6

Could professionals (in proc pls) from SAS company help me?

 

I am very curious about how is the factor score constructed (output xscore=) in proc pls. I have searched out online (http://support.minitab.com/en-us/minitab/17/topic-library/modeling-statistics/multivariate/principal...) about the definition of factor score---"To calculate the factor score, multiply the coefficients with your data."..."You must standardize the variables to obtain the correct factor score."

 

I think the XWeights should be the coefficients and I have tried to use them to construct the factor score. For each participant, I calculate the sum of 58 (the number of food groups) products between the xweight and standardized food intake, and then I verify whether the sum equals the factor score obtained from "output xscore=". However, for each participant, the sum of products based on xweights and standardized food intake does not equal the factor score directly obtained from proc pls (output xscore=) no matter what method (pcr, rrr, or pls) has been used. 

 

For PCA done via proc factor, I have verified that for each participant, the sum of the 58 (the number of food groups) products between the factor coefficient (statistics in _type_="score" from "outstat=") and standardized food intake equals the factor score directly obtained from proc factor.

 

I cannot figure out how the xscore is constructed from proc pls. Could someone help me with this issue? Thank you so much!

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

/*For each car, the sum of the 4 products btw the factor coefficient ("xweight")  
and standardized parameter (mean 0 std 1) does not equal to the factor score directly obtained from proc pls when method is pcr or rrr.

This calculation only works when method is pls*/

 

This is different than your original statement, where it did not work for PLS, but now it does.

 

The algorithms for RRR and PCR are different than PLS. I believe you'd need to do some research on the exact algorithm in order to obtain the formula to obtain the score from the data value and PLS weight.

 

In fact, I have no idea what the PCR option in PROC PLS does, despite what the documentation says. The scores from PROC PRINCOMP and the scores from using the PCR option in PROC PLS do not match. If you use PROC PRINCOMP, then the score is easily calculated from the PROC PRINCOMP loading times the standardized data value, and then adding these up. I suspect that the PROC PLS scores using method=PLS are not really scores, they are multiplied by sort of regression coefficient, but honestly I do not know for sure. In fact, that would be a good question for someone at SAS to answer, why does PROC PRINCOMP and PROC PLS with the PCR option give different scores but the same loadings? (And furthermore, I would never recommend using the PCR option in PROC PLS, it seems to me to have no legitimate purpose other than to allow users to access algorithms that were popular 30 years ago but are inferior to current methods)

--
Paige Miller

View solution in original post

20 REPLIES 20
PaigeMiller
Diamond | Level 26

In dimension 1, the score should be equal to the sum of the (data*xweight1) values (sum over all variables), where data is properly centered and scaled.

 

In dimension 2, the score should be equal to the sum of the (deflated data)*(xweight2), (sum over all variables) where deflated data is the data after the predictions from Dimension 1 are removed (in other words, these are the residuals after dimension 1).

 

And so on

--
Paige Miller
xcong
Fluorite | Level 6

Thank you so much for the response.

I think I already standardize the data into mean 0 and SD 1. I calculated the sum of the (standardized data*xweight1) values but the sum does not equal the factor score corresponding to the first factor (scorex1 if "output xscore=scorex";  in "xscore=scorex", scorex is the prefix and scorex1 means the factor score corresponding to the first factor).

Reeza
Super User

Can you post your code and we can replicate your process to see where you may be going wrong?

 

Note that it says the variables are centered and scaled, I'm assuming that you would use PROC STDIZE to do this and try and replicate the calculations. 

 

The PLS procedure does have some more details in the Details section of the documentation on how these are calculated, if that's helpful at all. Not sure why you're referencing MiniTab documentation. 

 

http://documentation.sas.com/?docsetId=statug&docsetTarget=statug_pls_details01.htm&docsetVersion=14...

 

Also, please note that this forum is not only SAS employees, some are, but the majority are users just like yourself. 

 

If you want an answer from SAS directly, perhaps SAS tech support would be a more appropriate route.

xcong
Fluorite | Level 6

Could you give me their contact information (people who produced the procedure pls in SAS company)? I think their source code for the factor score from proc pls when method is pcr, rrr, or pls will be very helpful. Thank you so much!

I have sent a note to SAS company and they directed me to these forums. Maybe I did not contact the right person.

PaigeMiller
Diamond | Level 26

Could you give me their contact information (people who produced the procedure pls in SAS company)?

 

No, I cannot do this because I don't know their contact information, or even their names.

 

I think their source code for the factor score from proc pls when method is pcr, rrr, or pls will be very helpful. Thank you so much!

 

While I don't speak for SAS, I'm quite sure they do not give out source code.

 

You haven't repsonded to my comments about how I can easily reproduce the scores in my example. If you are going to make progress, this (or similar) forum is the only way to get there.

--
Paige Miller
xcong
Fluorite | Level 6

"You haven't repsonded to my comments about how I can easily reproduce the scores in my example"

You do not know how to calculate the score by yourself? I feel totally confused about this sentence.

Reeza
Super User

If you don't have a valid SAS license then you don't have access to SAS tech support. I'm going to guess that's the case here?

 

It's unlikely you'll get the source code, since it's proprietary, but you can likely find the calculations in the details and reference section. 

 

 

Reeza
Super User

@xcong given @PaigeMiller ability to recreate the scores, it's likely that you're doing something incorrectly. 

In this case the best solution is to provide a fully worked example so that we can replicate your issue, otherwise we have to assume user error or gremlins.

xcong
Fluorite | Level 6

I feel totally confused about your first sentence. Have I done something inappropriate?

Reeza
Super User

 


@xcong wrote:

I feel totally confused about your first sentence. Have I done something inappropriate?


No, just saying that your code is likely incorrect and until we see it there's not much else that could have been said. You already posted an example above which clarifies why you're not getting the correct results. 

xcong
Fluorite | Level 6

Thank you so much for the response. I have posted my code. Will it facilitate you to help me find whether there is something wrong within the code?

 

I have already tried my best to find the algorithms from SAS documents (e.g. https://support.sas.com/documentation/onlinedoc/stat/131/pls.pdf) for the factor score when method is rrr or pls when using proc pls, but I cannot find any relevant information. That is why I have to post this question on this forum to ask for help.

 

Thank you again!

Reeza
Super User

The regression page I originally linked to has the details but it's in linear algebra terms. 

http://documentation.sas.com/?docsetId=statug&docsetTarget=statug_pls_details01.htm&docsetVersion=14...

 

Unfortunately I don't have the time to delve into such a topic at this moment, but I suspect most of what you're after is in the link above. 

Additionally, there's the reference section with the academic references that are used for the procedure.

PaigeMiller
Diamond | Level 26

@xcong wrote:

Thank you so much for the response. I have posted my code. Will it facilitate you to help me find whether there is something wrong within the code?

 

I have already tried my best to find the algorithms from SAS documents (e.g. https://support.sas.com/documentation/onlinedoc/stat/131/pls.pdf) for the factor score when method is rrr or pls when using proc pls, but I cannot find any relevant information. That is why I have to post this question on this forum to ask for help.

 

Thank you again!


As I tried to explain, I don't think the problem is your code specifically, I think the problem is that the algorithm for RRR and PCR is doing something different than the PLS algorithm, and so the scores are NOT necessarily the sum of the products (data*xweight). Either that or PROC PLS is giving the wrong scores for RRR and PCR. But that's not something I plan on digging into, you can dig into the formulas and see if you can figure out how to calculate scores in those cases.

--
Paige Miller
PaigeMiller
Diamond | Level 26

@xcong wrote:

Thank you so much for the response.

I think I already standardize the data into mean 0 and SD 1. 


You say: "I think", that's not good enough, especially if you can't get the  calculations to work. You need to confirm that the data has been standardized before you do these calculations.

 

I calculated the sum of the (standardized data*xweight1) values but the sum does not equal the factor score corresponding to the first factor (scorex1 if "output xscore=scorex";  in "xscore=scorex", scorex is the prefix and scorex1 means the factor score corresponding to the first factor).

 

Then you have made a mathematical mistake somewhere. This process works properly for me, for example:

 

ods output xweights=xweight xloadings=xloading;
proc pls data=sashelp.cars nfac=2 details;
	model msrp=horsepower weight;
	output out=pls_stats xscore=t;
run;
proc standard data=sashelp.cars out=cars_std mean=0 std=1;
	var horsepower weight;
run;

Using the data values in data set cars_std, multiplying by the dimension 1 xweight, then adding up across the two variables in the model (horsepower and weight) gives me the exact score shown in the data set PLS_STATS.

--
Paige Miller

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 20 replies
  • 2349 views
  • 4 likes
  • 3 in conversation