Solved: Re: PROC PLS, Is W (Xweights) has different calculation than the stand...

jky · Posted 03-21-2025 04:40 AM

Hi,

I hope you're well.

I am trying to run proc pls on a dataset (description shown below) for one factor (nfac = 1). I have read that for a single factor, the results (include xweights) should be the same regardless of the algorithm/method used (NIPALS or SIMPLS should produce the same results). Although the coefficients and R-squared values match with library in opensource in R/Python, though the xweights are slightly different. Does anyone know if SAS applies any additional rules that other open-source packages do not? Ex. do SAS use specific type of SVD/norms...... I don't think this is due to whether the data is mean-centered or scaled, as I have carefully checked this.

About my data:

1. Only one response variable

2. 60 independent variables

3. 10 observations

(it's high dimension with very low observations)

Thanks,

PaigeMiller · Posted 03-21-2025 06:30 AM

The difference in the weights is just a scaling factor. The SAS value divided by the R value is always approx 1.042. SAS and R obviously have scaled the weights differently, but it makes no difference at all to anything. (Why? Because these are multiplied by what SAS calls the "Inner Regression Coefficient" to obtain predicted values, the scaling of this value is also adjusted to get the proper predictions; and R uses a different scaling there so these "Inner Regression Coefficients" don't match between R and SAS, but when the multiplication happens the scaling difference cancel out).

--
Paige Miller

View solution in original post

Ksharp · Posted 03-21-2025 05:11 AM

The eigenvalue and eigenvector generated by sas or result from SVD is different than R or other package.
@Rick_SAS mentioned it before. Rick might explain it more details for you .

jky · Posted 03-21-2025 06:05 AM

Hi Ksharp,

Thanks for your reply! Interesting, I would like to learn more about how SAS compute SVD differently from Python or R.

Thanks,

Rick_SAS · Posted 03-21-2025 06:37 AM

> I would like to learn more about how SAS compute SVD differently from Python or R.

KSharp said that SAS and R compute eigenvectors differently, but what he should have said is that eigenvectors (and singular vectors) are not unique. Even after you standardize the eigenvectors to have unit norm, there is still a non-uniqueness property because if v is an eigenvector than so is -v. For the same problem, SAS might produce one (correct) eigenvector whereas R produces a different (equally correct) eigenvector that has the opposite sign.

Eigenvalues (and singular value) are unique, so those will agree up to some number of decimals.

And if this problem gets worse if there are non-unique eigenvalues because then the basis for the eigenspace does not have a unique representation.

Rick_SAS · Posted 03-21-2025 05:45 AM

> ... the xweights are slightly different.

Please post an example. For example, post the first 5 weights from SAS and the same weights from open source. Are the differences as that SAS reports the numbers to 6 digits whereas R reports 8 digits? Or are some weights in SAS the opposite sign as the weights in R? Or something else?

Also, please post your SAS code.

I encourage you to check the number of PCA components used for each model. Perhaps SAS is basing the model on k PCA components and your R model is using a different number of components.

jky · Posted 03-21-2025 06:04 AM

Hi Rick,

Thanks so much for getting back to me.

My SAS code is shown as below:

proc pls data=regress method=simpls nfac = 1;
  model Y = A01-A60;
  run;
  ods output 
    XWeights       = work.pls_xweights
run;

And here is the comparison of xweights getting from SAS vs Python for the first five dependent variables

Xweights	T01	T02	T03	T04	T05
SAS	-0.072658	0.172263	0.132138	-0.06494	0.225583
Python	-0.069733319	0.165327533	0.12681782	-0.062325287	0.216501385

A quick look I don't think it's the issue of different decimal places between Python or SAS, or with different signs. And the number of components here is one factor only, therefore I don't think their SVD will be different.

Thanks,

PaigeMiller · Posted 03-21-2025 06:30 AM

The difference in the weights is just a scaling factor. The SAS value divided by the R value is always approx 1.042. SAS and R obviously have scaled the weights differently, but it makes no difference at all to anything. (Why? Because these are multiplied by what SAS calls the "Inner Regression Coefficient" to obtain predicted values, the scaling of this value is also adjusted to get the proper predictions; and R uses a different scaling there so these "Inner Regression Coefficients" don't match between R and SAS, but when the multiplication happens the scaling difference cancel out).

--
Paige Miller

jky · Posted 03-21-2025 01:28 PM

Thank you so much Rick. Yeah, I agree that it might be because different software scales xweights differently. And yes, the xweight difference doesn't affect the coefficient or R-squared value, but it does make the VIP scores slightly different, which is a little annoying.

Does anyone know how SAS calculates norms? As from what I understand, xweight is obtained from the SVD of X'Y (which has the shape 1 × the number of independent variables). With only one dimension/observation, the easiest way to compute the SVD is by dividing X'Y by its norm. However, I understand (with my limited knowledge on linear algebra) that with only one dimension, there may not be a unique solution, but would be good to know how SAS process it if possible (Ex. with the inner regression coefficient)

PaigeMiller · Posted 03-21-2025 02:25 PM

The VIP scores are computed via the formula in the code here https://support.sas.com/kb/25/009.html

Specifically, you should look at the code for the %GET_VIP macro, the scaling there looks pretty standard. (UPDATE: the scaling is exactly what @Rick_SAS says)

You are claiming "the xweight difference doesn't affect the coefficient or R-squared value, but it does make the VIP scores slightly different". I would like to see an example that isn't just a difference in scaling.

--
Paige Miller

Rick_SAS · Posted 03-21-2025 02:31 PM

> Does anyone know how SAS calculates norms?

Unless the documentation states otherwise, you may assume that a vector norm is the L2 norm (aka, the Euclidean norm). So

||v|| = sqrt(v1^2 + v2^2 + ... + vn^2)

For example, see the L2 norm definition here: SAS Help Center: NORM Function

jky · Posted 03-24-2025 12:47 AM

Hi Paige,

I hope you had a good weekend.

Yeah, the difference in VIP still lies in the scaling. However, since we sometimes interpret VIP score by comparing an absolute value—for example, as a general rule of thumb, if VIP > 1, it indicates a significant variable that impacts the dependent variable—Then this cause a problem as T03 is considered an important variable in SAS but not in Python. Therefore, if possible, it would be helpful to understand how SAS performs the scaling. However, if this is not feasible, it should be fine, as the difference is small.

Variable	SAS_XWEIGHT	Python_XWEIGHT	PYTHON_VIP	SAS_VIP
T01	-0.0727	-0.0697	0.5402	0.5628
T02	0.1723	0.1653	1.2806	1.3343
T03	0.1321	0.1268	0.9823	1.0235
T04	-0.0649	-0.0623	0.4828	0.5030
T05	0.2256	0.2165	1.6770	1.7474

Thanks,

PaigeMiller · Posted 03-24-2025 05:40 AM

Since VIP is not unique, it can vary by a scaling constant, comparing to an absolute value doesn't make sense. I compare VIP of a variable to the VIPs of the other variables, with "biggest" VIPs being the ones I concentrate on.

it would be helpful to understand how SAS performs the scaling

I gave you a link with code. @Rick_SAS explained how SAS calculates this.

--
Paige Miller

Rick_SAS · Posted 03-24-2025 05:53 AM

Perhaps it would be helpful to understand how Python performs the scaling?

jky · Posted 03-24-2025 07:49 AM

Hi Rick,

Thank you. Yeah, I have explored how Python calculates Xweights as well, and it seems that it also uses the Euclidean norm. To clarify, I have created a simple raw dataset and SAS code and shared the Xweights (the first component of the SVD of X'Y or X'YY'X) with you below. You will see that the norm of Xweights in Python is 1 (I'm not sure, but I think this proves that it uses the Euclidean norm?), while the norm of Xweights in SAS is always around 1.04XXX. (Again, I have applied mean centering and scaling to the data, so that is not an issue of the difference, it's also not about the algorithm difference as with number of factor = 1, NIPALS algorithm will be the same as SIMPLS, which I have double checked too, it's not a flip of sign as well)

DATA regress;
    INPUT Y X1 X2 X3 X4 X5;
    DATALINES;
7 0 23 3 4 1
8 2 7 2 3 2
2 0 8 8 3 3
6 0 9 2 5 4
5 0 1 5 2 5
;
RUN;

PROC PLS DATA=regress METHOD=SIMPLS nfac = 1 details varss ;
    MODEL Y = X1 X2 X3 X4 X5;
RUN;

ods output 
    XWeights = work.pls_xweights
run;

Xweights	X1	X2	X3	X4	X5
SAS	0.487515	0.259817	-0.783895	0.223089	-0.344725
Python	0.467325566	0.249056761	-0.751431332	0.213850196	-0.330449077

PaigeMiller · Posted 03-24-2025 09:49 AM

Euclidean norm can have a value of 1, or some other value. Any norm can have a value of 1 or some other value.

If you divide the weights by the norm, then they should produce a vector with norm of 1. SAS is obviously not dividing the weights by the norm. Python must be dividing the weights by the norm. It's optional whether a PLS program does this or not, because it doesn't affect the predicted values or the model fit. SAS obviously applies the scaling factor later in the algorithm than Python does. So I conclude that SAS and Python are calculating the weights the exact same way (is that what you need to know?) and then Python scales them but SAS doesn't.

Here is simple data step code which finds the Euclidean norm of the weights, and then re-scales the weights by dividing by the norm, so that you can see the difference, and how after you do the division, the norm becomes 1.

DATA regress;
    INPUT Y X1 X2 X3 X4 X5;
    DATALINES;
7 0 23 3 4 1
8 2 7 2 3 2
2 0 8 8 3 3
6 0 9 2 5 4
5 0 1 5 2 5
;
RUN;

PROC PLS DATA=regress nfac = 1 details varss ;
ods output 
    XWeights = work.pls_xweights;
MODEL Y = X1 X2 X3 X4 X5;
RUN;

data ssq;
    set pls_xweights;
    norm_x=sqrt(uss(of x1-x5));
    y1=x1/norm_x;
    y2=x2/norm_x;
    y3=x3/norm_x;
    y4=x4/norm_x;
    y5=x5/norm_x;
    norm_y=sqrt(uss(of y1-y5));
run;

--
Paige Miller

PROC PLS, Is W (Xweights) has different calculation than the standard versions (Ex. from R/Python)

Re: PROC PLS, Is W (Xweights) has different calculation than the standard versions (Ex. from R/Pytho

Re: PROC PLS, Is W (Xweights) has different calculation than the standard versions (Ex. from R/Pytho

Re: PROC PLS, Is W (Xweights) has different calculation than the standard versions (Ex. from R/Pytho

Re: PROC PLS, Is W (Xweights) has different calculation than the standard versions (Ex. from R/Pytho

Re: PROC PLS, Is W (Xweights) has different calculation than the standard versions (Ex. from R/Pytho

Re: PROC PLS, Is W (Xweights) has different calculation than the standard versions (Ex. from R/Pytho

Re: PROC PLS, Is W (Xweights) has different calculation than the standard versions (Ex. from R/Pytho

Re: PROC PLS, Is W (Xweights) has different calculation than the standard versions (Ex. from R/Pytho

Re: PROC PLS, Is W (Xweights) has different calculation than the standard versions (Ex. from R/Pytho

Re: PROC PLS, Is W (Xweights) has different calculation than the standard versions (Ex. from R/Pytho

Re: PROC PLS, Is W (Xweights) has different calculation than the standard versions (Ex. from R/Pytho

Re: PROC PLS, Is W (Xweights) has different calculation than the standard versions (Ex. from R/Pytho

Re: PROC PLS, Is W (Xweights) has different calculation than the standard versions (Ex. from R/Pytho

Re: PROC PLS, Is W (Xweights) has different calculation than the standard versions (Ex. from R/Pytho

Re: PROC PLS, Is W (Xweights) has different calculation than the standard versions (Ex. from R/Pytho

2025 SAS Hackathon: There is still time!