turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- General Programming
- /
- Question about the difference between manually cal...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-20-2015 09:51 AM

Hi All, I came across a factor analysis issue. When I used the PROC FACTOR and PROC SCORE to get factor scores, I have noticed that the factor scores provided by SAS output do not match the ones calculated by hand. To be specific, I used the "standardized scoring coefficients" provided by SAS output and multiplied the coefficients by the standardized variables and then summed them up. The two factor score values are pretty close but not identical. I am not sure if this is caused by the rounding issues or there is one more step for manually calculating the factor scores.

Your comments and help are much appreciated.

Accepted Solutions

Solution

08-21-2015
12:34 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-21-2015 12:34 PM

You messed up the order of your variables in your scoring, the last two variables are flipped.

You listed them in a different order in the proc factor than they are in the data set, if the order is corrected the correct values are derived.

array vars(5) Age Weight RunTime **RunPulse RestPulse; *CORRECT;**

array vars_order(5) Age Weight RunTime **RestPulse RunPulse; *INCORRECT;**

data manual_score;

*load factors into temporary array for comparison;

if _n_=1 then do i=1 to 5;

set ttt;

array f1(5) _temporary_;

array f2(5) _temporary_;

f1(i)=factor1;

f2(i)=factor2;

end;

*load standardized data;

set stdfit;

*initialize factor scores to 0;

factor_score1=0;

factor_score2=0;

factor_score1_wrong=0;

factor_score1_wrong=0;

*Set array for variables - NOTE ORDER;

array vars(5) Age Weight RunTime RunPulse RestPulse;

array vars_order(5) Age Weight RunTime RestPulse RunPulse;

*Calculate correct factor scores;

do i=1 to 5;

factor_score1=sum(factor_score1, f1(i)*vars(i));

factor_score2=sum(factor_score2, f2(i)*vars(i));

end;

*Calculate incorrect factor scores;

do i=1 to 5;

factor_score1_wrong=sum(factor_score1_wrong, f1(i)*vars_order(i));

factor_score2_wrong=sum(factor_score2_wrong, f2(i)*vars_order(i));

end;

run;

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-20-2015 12:05 PM

If you looked at proc output tables in the results window and not a data set it is likely that some decimal rounding is likely.

You may to use ODS Output to send the results of the table to a dataset and calculate with that.

Add

ODS Output StdScoreCoef = CoefficentDataSetname;

to the proc code.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-20-2015 12:54 PM

Hi ballardw,

Thank you very much for your comments and help. But I tried the ODS output and have found that the ODS output values match exactly those of "standardized scoring coefficients" from PROC FACTOR.

I used the example from SAS: SAS/STAT(R) 9.2 User's Guide, Second Edition

Here is the screenshot for factor proc output.

And here is the result from ODS:

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-20-2015 01:39 PM

Hi All,

In order to provide more details about the issue, I have included the slightly modified SAS code and output screenshots for your reference.

**The SAS code:**

/* This data set contains only the first 12 observations */

/* from the full data set used in the chapter on PROC REG. */

data Fitness;

input Age Weight Oxygen RunTime RestPulse RunPulse @@;

datalines;

44 89.47 44.609 11.37 62 178 40 75.07 45.313 10.07 62 185

44 85.84 54.297 8.65 45 156 42 68.15 59.571 8.17 40 166

38 89.02 49.874 9.22 55 178 47 77.45 44.811 11.63 58 176

40 75.98 45.681 11.95 70 176 43 81.19 49.091 10.85 64 162

44 81.42 39.442 13.08 63 174 38 81.87 60.055 8.63 48 170

44 73.03 50.541 10.13 45 168 45 87.66 37.388 14.03 56 186

;

proc factor data=Fitness outstat=FactOut

method=prin rotate=varimax score;

var Age Weight RunTime RunPulse RestPulse;

title 'FACTOR SCORING EXAMPLE';

ODS output StdScoreCoef=ttt;

run;

proc print data=ttt;

title 'ODS Output Table';

run;

***User added this proc to get standardized variables;

proc stdize data=Fitness method=std out=stdfit;

var Age Weight Oxygen RunTime RestPulse RunPulse;

run;

Title 'Standardized Data';

proc print data=stdfit;

run;

proc print data=FactOut;

title2 'Data Set from PROC FACTOR';

run;

proc score data=Fitness score=FactOut out=FScore;

var Age Weight RunTime RunPulse RestPulse;

run;

proc print data=FScore;

title2 'Data Set from PROC SCORE';

run;

Title;

**Part of PROC FACTOR output:**

** The ODS output table:**

**The standardized data:**

**The factor scores (last two columns) calculated by SAS:**

**The factor score calculated by hand (Excel 2010):**

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-20-2015 04:12 PM

As soon as you bring in Excel I get nervous. HOW did you get the values into Excel? If you entered the values as shown in your output labeled "Part of PROC FACTOR output:" then you rounded the values. If you look at the value in the TTT data set, row labeled score you will see that the score for Age is -0.178464537 with a best12. format and -0.1784645370142 with a best16. format. I think your print defaulted to best8. format which does round the data. Copy and paste from the ODS output does not carry the additional decimals unless you print it with a longer format.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-20-2015 05:13 PM

Thanks, ballardw. Good suggestion. However, I also tried best12. format and showed all the numbers after the decimal. But the results did not change. Still, my calculation does not match SAS output. The largest change is about 109% [(mycalc-sas)/sas*100%]. So, I believe that it cannot be the rounding issue.

Solution

08-21-2015
12:34 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-21-2015 12:34 PM

You messed up the order of your variables in your scoring, the last two variables are flipped.

You listed them in a different order in the proc factor than they are in the data set, if the order is corrected the correct values are derived.

array vars(5) Age Weight RunTime **RunPulse RestPulse; *CORRECT;**

array vars_order(5) Age Weight RunTime **RestPulse RunPulse; *INCORRECT;**

data manual_score;

*load factors into temporary array for comparison;

if _n_=1 then do i=1 to 5;

set ttt;

array f1(5) _temporary_;

array f2(5) _temporary_;

f1(i)=factor1;

f2(i)=factor2;

end;

*load standardized data;

set stdfit;

*initialize factor scores to 0;

factor_score1=0;

factor_score2=0;

factor_score1_wrong=0;

factor_score1_wrong=0;

*Set array for variables - NOTE ORDER;

array vars(5) Age Weight RunTime RunPulse RestPulse;

array vars_order(5) Age Weight RunTime RestPulse RunPulse;

*Calculate correct factor scores;

do i=1 to 5;

factor_score1=sum(factor_score1, f1(i)*vars(i));

factor_score2=sum(factor_score2, f2(i)*vars(i));

end;

*Calculate incorrect factor scores;

do i=1 to 5;

factor_score1_wrong=sum(factor_score1_wrong, f1(i)*vars_order(i));

factor_score2_wrong=sum(factor_score2_wrong, f2(i)*vars_order(i));

end;

run;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-21-2015 12:49 PM

Thanks so much! I indeed overlooked the order of the variable output. I verified that in the Excel file and now the manual calculation matches the SAS output! Very appreciated!