BookmarkSubscribeRSS Feed
UKPhD
Calcite | Level 5
Please forgive the long-winded explanation, I am a new SAS user.

I am using RRR to create dietary patterns at a baseline time point and using this pattern to look at diet over a 10 year period which requires repeated scores.

Using the xweight ods output from PROC PLS RRR I am able to use PROC SCORE to produce a score based on individuals food intake. However, I have hit a problem. To check my methodology, I have applied the PROC SCORE to the same data used in the PROC PLS to create a RRR score. The hypothesis was that the applied and natural scores would be the same, however, they are not. They are systematically different by 11.18%. This same ratio appears if i use the same method in a completely different data set with different x groups .

Does anyone have any ideas why this very consistent error keeps cropping up? Does anyone have information on the way in which PROC PLS method = RRR applies its xweights to the data to create the score?

Thanks for any help, code below.

Our code:

*KEEP CENTRED AND SCALED PREDICTOR (FOOD GROUP) VARIABLES & NATURAL DP
SCORE PRODUCED BY EXPL RRR
& REMOVE RAW DATA;
data scaled;
set pattern10;
keep cid_477a qlet $foods2
pred10score1 ;
run;

************************************************************************
CONFIRMATORY RRR USING CENTRED AND SCALED DATA;

*MAKE XWEIGHTS (SCORING FILE) SUITABLE FOR PROC SCORE;

data scores;
set rrr10xweights;

if Numberoffactors > 1 then delete;*only interested in 1st pattern;
drop Numberoffactors;

_TYPE_="SCORE";
_NAME_="Factor1";

/* rename scoring variables to match scaled predictor variable names*/
rename $foods = $foods2;
run;

*RE-SCORE SCALED AND CENTRED PREDICTOR VARIABLES using scoring
coefficients to test confirmatory RRR;

proc score data=scaled out=pattern10_1 score=scores type="SCORE"
nostd;
var $foods2;
run;

***************************************************************************
COMPARE 'NATURAL' AND 'APPLIED' SCORES;

*check correlation between natural and applied scores;
proc corr data=pattern10_1;
var pred10score1 factor1;
run;

*calculate differences and ratio b/w natural and applied scores;

proc rank data=pattern10_1 out=ranks;
ranks rankpred10 rankfact1;
var pred10score1 factor1;
run;

proc sort data=ranks;
by pred10score1;
run;

data rankdiff;
set ranks;
difpat1=factor1 - pred10score1;
ratiopat1=factor1/pred10score1;
difrank=rankpred10 - rankfact1;
run;

proc means;
var difpat1 ratiopat1 difrank;
run;

Message was edited by: UKPhD Message was edited by: UKPhD
6 REPLIES 6
Paige
Quartz | Level 8
If they are systematically different by 11.18%, then you have a scaling issue. Somehow, somewhere, in either PROC PLS or PROC SCORE, things are not being scaled properly. I would check the scaling options in both procedures.
carlosmirandad
Obsidian | Level 7

I noticed the same problem and am sure the data is properly centered and scaled by the PLS procedure.   I also used the centering/scaling output table from PROC PLS and applied it to the new data in PROC SCORE.  See sample code attached to reproduce the issue.   The results of the last proc means show a ~20% difference between the original factor generated by RRR and the factor generated by PROC SCORE (and the % difference is the same for all observations.)  If any thoughts on what I may be missing or assuming incorrectly, please advice.

johnsonrk2
Calcite | Level 5

Was this issue ever resolved? I am facing the same problem. I can replicate the standardization of my input xvars from PROC PLS METHOD=RRR output (in PROC STDIZE), but the PROC SCORE results after applying the xweights do not match. 

PaigeMiller
Diamond | Level 26

I think you might want to contact SAS technical support to have them track down if this is an error in SAS; or if SAS is working properly and it's your mistake. They will probably forward you (although I can't guarantee that they will) to people who are very familiar with how PLS METHOD=RRR works. I have not seen anyone here in the SAS Community with that knowledge.

--
Paige Miller
johnsonrk2
Calcite | Level 5

Thank you! I'll report back.

johnsonrk2
Calcite | Level 5

From conversations with SAS Technical Support, re: how to generate xscores from the PROC PLS method=RRR procedure in a new dataset, there are limited options:

 

1. Use an IML module (the best way and their recommendation)

2. Set all response values to missing in the new dataset. Append the new dataset to the original dataset in which you have both predictor and response values. Run PROC PLS with the number of factors suggested by the original dataset. All observations will receive xscores (including the new dataset, scored by the response/predictor model from the original dataset). 

 

The instructions in PROC PLS documentation for saving output and utilizing PROC SCORE apparently refer to saving the parameter estimates and generating the predicted values, but do not give any opportunity to replicate the xscore.

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 2361 views
  • 0 likes
  • 5 in conversation