Hi, This is my first time posting, so 1) please forgive my mistakes and 2) hit me with any suggestions for how I can help you help me better! For reference, I am using SAS Enterprise. I have recently been thrown into a project involving factor analysis. Because of the nature of the work, I will not be able to share any data. In lieu of that, I will try to walk you through the procedure. The idea is that we are trying to examine the psychometric properties of a certain instrument (22 questions) that has been administered to a novel population. The instrument data (ignoring demographic and identifying fields) takes the form of an nx22 matrix where every field can take on an integer value from 1 to 5 (ordinal data). The paper we are more or less following can be seen here: https://europepmc.org/article/pmc/5046963 Unfortunately the above paper does not provide a table of all of their eigenvalues so I cannot see if that was an issue they ran into. I also looked at the original paper that this instrument was based on (https://pubmed.ncbi.nlm.nih.gov/15921473/), but it did not provide much guidance on the subject either. In short, I am trying to run exploratory factor analysis on polychoric correlations, but some of my eigenvalues are less than zero (image of preliminary eigenvalues attached below). I have gone through a substantial amount of the literature but it seems rather foggy when it comes to polychoric correlations (my data is ordinal so Pearson's correlations are inappropriate). It is worth noting, however, that I do not have any Heywood or Ultra Heywood situations where my communalities are >=1. Although I will likely only end up retaining around 3-4 factors, I am assuming this implication of negative variance is problematic for the interpretability of my model. The code I used for the factoring is as follows: %macro efa(sport_data,columns,method);
*Create polychoric correlations;
*Delete noprint option to display coefficient alpha;
ods graphics on;
proc corr data=&sport_data polychoric outplc=sport_polychoric nomiss alpha noprint;
run;
*Conduct Factor Analysis;
*Calculate KMO measure;
*Estimation Method: Unweighted leat squares;
*Rotation: Quartimin;
*Minimum Eigenvalue: 1;
proc factor data=sport_polychoric rotate=quartimin method=&method mineigen=1 scree corr msa;
title &sport_data;
run;
%mend; One source (https://www.tandfonline.com/doi/pdf/10.1080/10705511.2020.1735393) I have found outlined four potential reasons why the correlation matrix may be indefinite. 1) the number of observations is less than the number of items 2) not all correlations are based on the same number of cases 3) the variables are not linearly independent 4) there are items with 0 variance We have a much larger sample size n = 6,547 (after performing data cleaning and list-wise deletion). So numbers 1 and 2 should not be an issue. I ran proc means on all of my items and verified that none of them have a zero variance. 3 is the only one I am uncertain about, and here lies my first question. Is there an efficient way to test for multicollinearity in SAS? I would ideally be able to test every item in the survey with respect to the other items. I presume I would then delete any problematic items. My second question is, if the above are not the cause of my negative eigenvalues, what else could it be? Would some smoothing measures be appropriate, and if so, how would I go about implementing them in SAS? Assuming I can resolve the negative eigenvalue issue, how are my assumptions when it comes to things like Bartlett's Test for Sphericity and the KMO measure affected? Are these even applicable when it comes to polychoric correlations, or are there other more appropriate measures? Lastly, I am curious about parallel analysis and both its applicability and implementation. I have a macro (attached below), that I believe works and have adjusted slightly so that the actual/non-simulated eigenvalues are calculated using a polychoric correlation. I am not sure if with parallel analysis, I should be feeding it a polychoric correlation or not. I apologize for the lengthy post, but this is my first time ever using/learning about factor analysis. I would greatly appreciate your help and suggestions! *Macro for conducting Parallel Analysis;
%macro parallel(data=_LAST_, var=_NUMERIC_,niter=1000, statistic=Median,method=uls);
/*--------------------------------------*
| Macro Parallel |
| Parameters |
| data = dataset to be analyzed |
| (default: _LAST_) |
| var = variables to be analyzed |
| (default: _NUMERIC_) |
| niter= number of simulated datasets |
| to create (default: 1000) |
| statistic = statistic used to |
| summarized eigenvalues |
| (default: Median. Other |
| possible values: P90, |
| P95, P99) |
| Output |
| Graph of actual vs. simulated |
| eigenvalues |
*--------------------------------------*/
data _temp;
set &data;
keep &var;
run;
/* obtain number of observations and
variables in dataset */
ods output Attributes=Params;
ods listing close;
proc contents data=_temp ;
run;
ods listing;
data _NULL_;
set Params;
if Label2 eq 'Observations' then
call
symput('Nobs',Trim(Left(nValue2)));
else if Label2 eq 'Variables' then
call
symput('NVar',Trim(Left(nValue2)));
run;
/* create polychoric matrix */
proc corr data=_temp polychoric outplc=_temp noprint;
run;
/* obtain eigenvalues for actual data */
proc factor data=_temp method=&method nfact=&nvar nprint
outstat=E1(where=(_TYPE_ = 'EIGENVAL'));
var &var;
run;
data E1;
set E1;
array A1{&nvar} &var;
array A2{&nvar} X1-X&nvar;
do J = 1 to &nvar;
A2{J} = A1{J};
end;
keep X1-X&nvar;
run;
/* generate simulated datasets and obtain
eigenvalues */
%DO K = 1 %TO &niter;
data raw;
array X {&nvar} X1-X&nvar;
keep X1-X&nvar;
do N = 1 to &nobs;
do I = 1 to &nvar;
X{I} = rannor(-1);
end;
output;
end;
run;
/* create polychoric matrix */
/*proc corr data=raw polychoric outplc=raw noprint;*/
/* run;*/
proc factor data=raw nfact=&nvar noprint
outstat=E(where=(_TYPE_ ='EIGENVAL'));
var X1-X&nvar;
proc append base=Eigen
data=E(keep=X1-X&nvar);
run;
%END;
/* summarize eigenvalues for simulated
datasets */
proc means data=Eigen noprint;
var X1-X&nvar;
output out=Simulated(keep=X1-X&nvar)
&statistic=;
proc datasets nolist;
delete Eigen;
proc transpose data=E1 out=E1;
run;
proc transpose data=Simulated out=Simulated;
run;
/* plot actual vs. simulated eigenvalues */
data plotdata;
length Type $ 9;
Position+1;
if Position eq (&nvar + 1)
then Position = 1;
set E1(IN=A)
Simulated(IN=B);
if A then Type = 'Actual';
if B then Type = 'Simulated';
rename Col1 = Eigenvalue;
run;
title height=1.5 "Parallel Analysis - &statistic Simulated Eigenvalues";
title2 height=1 "&nvar Variables, &niter Iterations, &nobs Observations";
proc print data = plotdata;
run;
symbol1
interpol = join
value=diamond
height=1
line=1
color=blue
;
symbol2
interpol = join
value=circle
height=1
line=3
color=red
;
proc gplot data = plotdata;
plot Eigenvalue * Position = Type;
run;
quit;
%mend parallel;
... View more