BookmarkSubscribeRSS Feed
wmespi
Obsidian | Level 7

Hi,

 

This is my first time posting, so 1) please forgive my mistakes and 2) hit me with any suggestions for how I can help you help me better! For reference, I am using SAS Enterprise.

 

I have recently been thrown into a project involving factor analysis. Because of the nature of the work, I will not be able to share any data. In lieu of that, I will try to walk you through the procedure. The idea is that we are trying to examine the psychometric properties of a certain instrument (22 questions) that has been administered to a novel population. The instrument data (ignoring demographic and identifying fields) takes the form of an nx22 matrix where every field can take on an integer value from 1 to 5 (ordinal data). The paper we are more or less following can be seen here: https://europepmc.org/article/pmc/5046963 

 

Unfortunately the above paper does not provide a table of all of their eigenvalues so I cannot see if that was an issue they ran into. I also looked at the original paper that this instrument was based on (https://pubmed.ncbi.nlm.nih.gov/15921473/), but it did not provide much guidance on the subject either.

 

In short, I am trying to run exploratory factor analysis on polychoric correlations, but some of my eigenvalues are less than zero (image of preliminary eigenvalues attached below). I have gone through a substantial amount of the literature but it seems rather foggy when it comes to polychoric correlations (my data is ordinal so Pearson's correlations are inappropriate). It is worth noting, however, that I do not have any Heywood or Ultra Heywood situations where my communalities are >=1. Although I will likely only end up retaining around 3-4 factors, I am assuming this implication of negative variance is problematic for the interpretability of my model.

 

wmespi_0-1606325419323.png

 
 

 

The code I used for the factoring is as follows:

%macro efa(sport_data,columns,method);	

	*Create polychoric correlations;
	*Delete noprint option to display coefficient alpha;
	ods graphics on;
	proc corr data=&sport_data polychoric outplc=sport_polychoric nomiss alpha noprint;
	run;

	*Conduct Factor Analysis;
	*Calculate KMO measure;
	*Estimation Method: Unweighted leat squares;
	*Rotation: Quartimin;
	*Minimum Eigenvalue: 1;
	proc factor data=sport_polychoric rotate=quartimin method=&method mineigen=1 scree corr msa;
		title &sport_data;
	run;
%mend;

 

One source (https://www.tandfonline.com/doi/pdf/10.1080/10705511.2020.1735393) I have found outlined four potential reasons why the correlation matrix may be indefinite. 

1) the number of observations is less than the number of items

2) not all correlations are based on the same number of cases

3) the variables are not linearly independent

4) there are items with 0 variance

 

We have a much larger sample size n = 6,547 (after performing data cleaning and list-wise deletion). So numbers 1 and 2 should not be an issue. I ran proc means on all of my items and verified that none of them have a zero variance. 3 is the only one I am uncertain about, and here lies my first question.

 

Is there an efficient way to test for multicollinearity in SAS? I would ideally be able to test every item in the survey with respect to the other items. I presume I would then delete any problematic items.

 

My second question is, if the above are not the cause of my negative eigenvalues, what else could it be? Would some smoothing measures be appropriate, and if so, how would I go about implementing them in SAS?

 

Assuming I can resolve the negative eigenvalue issue, how are my assumptions when it comes to things like Bartlett's Test for Sphericity and the KMO measure affected? Are these even applicable when it comes to polychoric correlations, or are there other more appropriate measures?

 

Lastly, I am curious about parallel analysis and both its applicability and implementation. I have a macro (attached below), that I believe works and have adjusted slightly so that the actual/non-simulated eigenvalues are calculated using a polychoric correlation. I am not sure if with parallel analysis, I should be feeding it a polychoric correlation or not.

 

I apologize for the lengthy post, but this is my first time ever using/learning about factor analysis. I would greatly appreciate your help and suggestions!

 

 

*Macro for conducting Parallel Analysis;
%macro parallel(data=_LAST_, var=_NUMERIC_,niter=1000, statistic=Median,method=uls);
/*--------------------------------------*
| Macro Parallel |
| Parameters |
| data = dataset to be analyzed |
| (default: _LAST_) |
| var = variables to be analyzed |
| (default: _NUMERIC_) |
| niter= number of simulated datasets |
| to create (default: 1000) |
| statistic = statistic used to |
| summarized eigenvalues |
| (default: Median. Other |
| possible values: P90, |
| P95, P99) |
| Output |
| Graph of actual vs. simulated |
| eigenvalues |
*--------------------------------------*/
data _temp;
set &data;
keep &var;
run;

/* obtain number of observations and
variables in dataset */
ods output Attributes=Params;
ods listing close;

proc contents data=_temp ;
run;

ods listing;

data _NULL_;
set Params;
if Label2 eq 'Observations' then
call
symput('Nobs',Trim(Left(nValue2)));
else if Label2 eq 'Variables' then
call
symput('NVar',Trim(Left(nValue2)));
run;

/* create polychoric matrix */
proc corr data=_temp polychoric outplc=_temp noprint;
run;

/* obtain eigenvalues for actual data */
proc factor data=_temp method=&method nfact=&nvar nprint
outstat=E1(where=(_TYPE_ = 'EIGENVAL'));
var &var;
run;

data E1;
set E1;
array A1{&nvar} &var;
array A2{&nvar} X1-X&nvar;
do J = 1 to &nvar;
A2{J} = A1{J};
end;
keep X1-X&nvar;
run;

/* generate simulated datasets and obtain
eigenvalues */
%DO K = 1 %TO &niter;
data raw;
array X {&nvar} X1-X&nvar;
keep X1-X&nvar;
do N = 1 to &nobs;
do I = 1 to &nvar;
X{I} = rannor(-1);
end;
output;
end;
run;

/* create polychoric matrix */
/*proc corr data=raw polychoric outplc=raw noprint;*/
/* run;*/

proc factor data=raw nfact=&nvar noprint
outstat=E(where=(_TYPE_ ='EIGENVAL'));
var X1-X&nvar;

proc append base=Eigen
data=E(keep=X1-X&nvar);
run;
%END;
/* summarize eigenvalues for simulated
datasets */
proc means data=Eigen noprint;
var X1-X&nvar;
output out=Simulated(keep=X1-X&nvar)
&statistic=;

proc datasets nolist;
delete Eigen;

proc transpose data=E1 out=E1;
run;

proc transpose data=Simulated out=Simulated;
run;

/* plot actual vs. simulated eigenvalues */
data plotdata;
length Type $ 9;
Position+1;
if Position eq (&nvar + 1)
then Position = 1;
set E1(IN=A)
Simulated(IN=B);
if A then Type = 'Actual';
if B then Type = 'Simulated';
rename Col1 = Eigenvalue;
run;

title height=1.5 "Parallel Analysis - &statistic Simulated Eigenvalues";
title2 height=1 "&nvar Variables, &niter Iterations, &nobs Observations";

proc print data = plotdata;
run;

symbol1
interpol = join
value=diamond
height=1
line=1
color=blue
;
symbol2
interpol = join
value=circle
height=1
line=3
color=red
;

proc gplot data = plotdata;
plot Eigenvalue * Position = Type;
run;

quit;
%mend parallel;

 

 

20 REPLIES 20
wmespi
Obsidian | Level 7

Ad an addendum, I have just received word that Singular Value Decomposition is a promising lead for determining multicollinearity. However, when I look to do this in SAS all I see is code that relies on proc iml. If this is not accessible to me, are there other ways of going about this?

ballardw
Super User

Type "negative eigenvalues" into your favorite search engine.

 

There are enough results that context is important. So find something similar to what your data represents.

 

I am not an expert on eigenvalues but a very brief search shows multiple causes and the interpretation of such is dependent on use.

wmespi
Obsidian | Level 7
To clarify, are you asking me to find and post data that is similar to my own? I will look and see what I can find. Is there any other context I can provide that would be helpful to you?
PaigeMiller
Diamond | Level 26

Your favorite search engine will find many discussions of negative eignevalues, such as https://stats.idre.ucla.edu/sas/output/factor-analysis/

 

They generally happen when your matrix is not full rank.

 

You show us all that macro code, and yet you don't show us the place where you actually call the macro, and so we don't know what value you are using for &METHOD, which I think we would need to know.

--
Paige Miller
wmespi
Obsidian | Level 7
The method in the parallel analysis macro is unspecified for the simulation component (so I believe that will default for PCA). For calculating the actual eigenvalues in the parallel analysis and in the efa macro, the method is unweighted least squares (ULS).
PaigeMiller
Diamond | Level 26

Is all of this analysis being done to try to detect and perhaps mitigate multicollinearity in a modeling procedure?

--
Paige Miller
wmespi
Obsidian | Level 7

The reason I am investigating multicollinearity is because I read that is not something you want in an EFA model and is potentially a cause of negative eigenvalues. I was wondering if SAS has an easy mechanism for checking for multicollinearity in your data so that I could remove any potentially problematic questionnaire items and see if that makes my correlation matrix then becomes Positive Definite.

 

Aside from that, I was just wondering what are the general protocols when dealing with negative eigenvalues in an EFA model. Do they affect the integrity of the model? They are only present in the initial factorization (# of factors = # of items).

 

wmespi_0-1606497575500.png

 

If these negative eigenvalues are not of consequence, then this lets me proceed to my next question of deciding the appropriate number of factors to retain. I saw the typical methods are the scree plot and kaiser criteria, but these are not considered to be incredibly robust. I wanted to incorporate parallel analysis using the macro I had originally posted, but I was not sure if it can be used with ordinal data.

 

Lastly, as a way of supporting the usage of factor analysis, I wanted to calculate the KMO measure and conduct Bartlett's test of Sphericity. However, again with both of these I am not sure if having ordinal data messes up the assumptions used to calculate them. The KMO I can at least calculate in SAS using my polychoric correlation matrix and unweighted least squares method  but Bartlett's (I believe) can only be calculated with the maximum likelihood estimation method which does not support polychoric correlation matrices.

 

PaigeMiller
Diamond | Level 26

The reason I am investigating multicollinearity is because I read that is not something you want in an EFA model and is potentially a cause of negative eigenvalues.

If there was no multicollinearity, then you wouldn't need to do Factor Analysis or Principal Components or any similar method. Multicollinearity is almost unavoidable in real data, and so any EFA model is based on data that has multicollinearity.

 

If the explanation I gave earlier is correct, that the matrix is not full rank, then I think your best choice is to work with just a few factors, but beyond that, I'm afraid its still not clear to me where to go from here, or even what the goals of your analysis are. You state some other methods you might want to consider, but not the end goal of the analysis.

--
Paige Miller
PaigeMiller
Diamond | Level 26

Adding

 

We need to get to the bottom of this "is the matrix full rank?" issue, to know if that is indeed the cause of your negative eigenvalues. I suggest your run PROC PRINCOMP on your data to find out. If there are eigenvalues from PROC PRINCOMP that are within roundoff error of zero, then your matrix is not full rank. If not, then we need to determine another possible cause and interpretation of the negative eigenvalues from EFA.

--
Paige Miller
wmespi
Obsidian | Level 7

Hi,

 

Sorry I have been out of the office for the weekend. 

 

There are a few reasons I am doing exploratory factor analysis (as opposed to principal component analysis):

  • I want to generate hypothetical models for the underlying factor structure of the items on the questionnaire
  • I want to determine if there are any problematic items that do not load strongly onto any factors or load strongly onto multiple factors. These items could be subject for review and/or removal.
  • I will then test these hypothetical models with confirmatory factor analysis.
  • A previous study suggests that a 3 factor model is expected, but the scale of data we are working with is much larger, so we want to see if our findings will be in agreement.

 

To address some of the previous responses:

  • That makes intuitive sense that you would want multicollinearity to be present in the data for factor analysis to be of use. I think maybe I was confusing that with one variable being a linear combination of another variable(s).
  • I will take out the MSA option, but I had originally included it so that I could see the measure of sampling adequacy. Why would the inclusion/exclusion of this metric change how my correlation matrix is calculated?
  • I had tried using the PARALLEL option as well as setting NFACTORS=PARALLEL, but I do not think SAS Enterprise Guide 7.1 supports these features. 
  • The eigenvalues are generated from PROC FACTOR. Those are the initial eigenvalue estimates for 22 factors, not the final estimates for the x number of retained factors. The options passed into the EFA macro are sport_data (the name of the raw dataset), columns (the variables I want to analyze using factor analysis), and method (the method of factor analysis, in this case unweighted least squares).

 

After running PROC PRINCOMP on my data I obtain the following results:

wmespi_0-1606746306650.png

To be perfectly honest, I am not really sure what to make of this or how I use this process to determine if the matrix is full rank. As an additional note, I passed the raw data into the process, not the polychoric correlation matrix. The image below is the result of when I run the process with the polychoric correlation matrix.

wmespi_1-1606746455108.png

 

 

PaigeMiller
Diamond | Level 26

So your PROC PRINCOMP indicates that the matrix is full rank, and that cannot be the cause of the negative eigenvalues in the EFA. It can be that the problem is "ill-conditioned", as stated here, but beyond that, I really don't have further advice as I have not run into negative eigenvalues in EFA.

 

 

--
Paige Miller
wmespi
Obsidian | Level 7
Responded to Rick_SAS and this post because I am unsure if you get a notification if I don't reply to your specific post. ^
PaigeMiller
Diamond | Level 26

I don't think you need a singular value decomposition, I think the eigenvalues from PROC PRINCOMP are the same as the SVD. These don't indicate to me ill-conditioned matrix, the ratio of max eigenvalue to min eigenvalues is approximately 100, or 2 in log10 units, which doesn't seem too terrible. But that still leaves a mystery as to why you get negative eigenvalues from the EFA.

--
Paige Miller
wmespi
Obsidian | Level 7

In the discussion of negative eigenvalues with EFA, I have found the following links. 

 

Stack Exchange (top response):

https://stats.stackexchange.com/questions/97802/how-to-correctly-interpret-a-parallel-analysis-in-ex...

 

Gently Clarifying the Application of Horn’s Parallel Analysis to Principal Component Analysis Versus Factor Analysis:

https://pdxscholar.library.pdx.edu/cgi/viewcontent.cgi?article=1026&context=commhealth_fac

 

In the paper he states that eigenvalues in factor analysis can be negative. I am still mulling over what he is asserting as the main difference between the stopping criteria for parallel analysis with PCA vs FA but I will read over it some more.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 20 replies
  • 2798 views
  • 1 like
  • 4 in conversation