BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
xcong
Fluorite | Level 6

Thank you so much for the response.

I can confirm I have already standardized the data before proc pls. I think it is a great way for us to use the same common data to discuss this statistical problem, since I cannot reveal the data or specific SAS code from the research project to others due to the contract. My research project will use proc pls but the method is rrr instead of pls. The number of response variables are two or three and the number of predictor variables will be about 50. I used the data sashelp.cars similarly with the research project with two response variables (msrp and invoice) and four predictor variables (weight, length, Wheelbase, and MPG_Highway).

Based on the following code I wrote, using the data values in data set cars_std, multiplying by the dimension 1 xweight, then adding up across the four (predictor) variables in the model gives me the exact score from proc pls ONLY when method is pls (which is the default method in proc pls as your code indicated. However, this procedure to calculate factor score does not work for method rrr and pcr. Please refer to the following code I have written). Could you help me with this issue? Thanks!

 

/*standardize to mean 0 std 1*/
proc standard data=sashelp.cars out=cars_std mean=0 std=1;
     var weight length Wheelbase MPG_Highway;
run;
/*standardized car para*/
data car_std_para;
  set cars_std (keep=weight length Wheelbase MPG_Highway);
  id=_n_;
run;
proc transpose data=car_std_para out=car_std_para prefix=id_;
  id id;
run;
proc sort data=car_std_para; by _name_; run;

/*proc pls; 3 methods: pcr, rrr, or pls*/
/*factor analysis*/
%macro fa_pls_car (m, nfct, topic);
ods listing close;
ods output XLoadings=&m._&nfct._xld;
ods output XWeights=&m._&nfct._xwt;
proc pls data=cars_std method=&m. nfac=&nfct. varss details censcale varscale;
     title "&topic. (&nfct. factors) based on 4 car parameters";
     model msrp invoice=weight length Wheelbase MPG_Highway/solution;
  output out=car_&m._&nfct. xscore=scorex yscore=scorey;
run;
ods listing;
%mend;
%fa_pls_car(pcr,3, Principal components regression);
%fa_pls_car(pcr,2, Principal components regression);
%fa_pls_car(rrr,2, Reduced rank regression);
%fa_pls_car(pls,2, Partial least squares);

/*calculate factor score*/
%macro calc_fs_proc_pls_car(m,nfct,v,n);
proc transpose data=&m._&nfct._&v.(drop=NumberOfFactors) out=tmp0; run;
proc sort data=tmp0; by _name_; run;
data tmp1;
  merge car_std_para tmp0;
  by _name_;
  array qt id_1-id_&n.;
  array score1 sc_1-sc_&n.;
  do over qt;
  score1=col1*qt;
  /*multiply the factor coefficient (xweight) or other stat for factor1
    with standardized data (mean 0 std 1)*/
  end;
run;
proc sql;
 create table tmp2 (keep=fs_:) as
 select 
    %do i=1 %to &n.;
  sum(sc_&i.) as fs_&i
       %if &i. ne &n. %then %do; , %end;
    %end;
  from tmp1;
quit;
proc transpose data=tmp2 out=tmp2; run;
data tmp3 (keep=id scorex1);
  set car_&m._&nfct.;
  id=_n_;
run;
data fs_&m._&nfct._&v. (keep=id col1);
  set tmp2; id=_n_;
run;

title "&m._&nfct._&v.";
proc means data=fs_&m._&nfct._&v.; var col1; run;
proc means data=tmp3; var scorex1; run;

data diff_fs_&m._&nfct._&v.;
  merge fs_&m._&nfct._&v. tmp3;
  by id;
  length diff_fs_ind $ 20;
  diff_fs=scorex1-col1;
  if diff_fs<0.01 then diff_fs_ind="diffrence <0.01";
  else diff_fs_ind="diffrence>=0.01";
run;
proc freq data=diff_fs_&m._&nfct._&v.; tables diff_fs_ind; run;
%mend;

/*For each car, the sum of the 4 products btw the factor coefficient ("xweight")  
and standardized parameter (mean 0 std 1) does not equal to the factor score directly obtained from proc pls when method is pcr or rrr.

This calculation only works when method is pls*/
%calc_fs_proc_pls_car(pcr,3,xwt,428);
%calc_fs_proc_pls_car(pcr,2,xwt,428);
%calc_fs_proc_pls_car(rrr,2,xwt,428);
%calc_fs_proc_pls_car(pls,2,xwt,428);

PaigeMiller
Diamond | Level 26

/*For each car, the sum of the 4 products btw the factor coefficient ("xweight")  
and standardized parameter (mean 0 std 1) does not equal to the factor score directly obtained from proc pls when method is pcr or rrr.

This calculation only works when method is pls*/

 

This is different than your original statement, where it did not work for PLS, but now it does.

 

The algorithms for RRR and PCR are different than PLS. I believe you'd need to do some research on the exact algorithm in order to obtain the formula to obtain the score from the data value and PLS weight.

 

In fact, I have no idea what the PCR option in PROC PLS does, despite what the documentation says. The scores from PROC PRINCOMP and the scores from using the PCR option in PROC PLS do not match. If you use PROC PRINCOMP, then the score is easily calculated from the PROC PRINCOMP loading times the standardized data value, and then adding these up. I suspect that the PROC PLS scores using method=PLS are not really scores, they are multiplied by sort of regression coefficient, but honestly I do not know for sure. In fact, that would be a good question for someone at SAS to answer, why does PROC PRINCOMP and PROC PLS with the PCR option give different scores but the same loadings? (And furthermore, I would never recommend using the PCR option in PROC PLS, it seems to me to have no legitimate purpose other than to allow users to access algorithms that were popular 30 years ago but are inferior to current methods)

--
Paige Miller
xcong
Fluorite | Level 6

Could someone else help me with this issue? About the factor score construction when method is pcr or rrr for proc pls?

Or how could I contact with the technicians/specialists from SAS company to resolve this problem?

Thank you so much!

PaigeMiller
Diamond | Level 26

You could call SAS Technical Support directly.

--
Paige Miller
xcong
Fluorite | Level 6

Thank you so much for all the responses!

However, I do not think this problem is resolved.

I will try my best to contact with the technicians from SAS company (I may need the SAS license number based on the previous responses; I think I should have one since I can use SAS on the computer) to resolve this problem.

PaigeMiller
Diamond | Level 26

@xcong

Please let us know what they say.

--
Paige Miller

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 20 replies
  • 2350 views
  • 4 likes
  • 3 in conversation