Solved: Re: proc factor with polychoric correlation matrix: yields errors

mszommer · Posted 02-16-2017 11:22 AM

Hello,

I resorted to polychoric correlation matrix as my variables are all either scale-based (likert-scaled) or dichotonomous.I have 103 variables in total

I used the OUTPLC= option

        proc corr data=survey.mydata outplc=survey.pchor_dm_new;
        var
        Q1   Q3   Q4A Q4B Q4C Q4D Q4E Q4F Q4G Q4H Q4I Q4J Q5A Q5B
        Q5C Q5D Q5E Q5F Q5G Q5H Q6A Q6B Q6C Q6D Q6E Q6F Q7   Q10A
        Q10B Q10C Q10D Q10E   Q10F Q10G Q11A Q11B Q11C Q11D Q11E Q11F Q11G Q13
        Q15A Q15B Q15C Q15D Q16A Q16B Q16C Q17 Q18A Q18B Q18C Q18D Q19A Q19B
        Q19C Q20A Q20B Q20C   Q21A Q21B Q21C Q21D Q21E Q21F Q21G Q21H Q21I Q21J
        Q22   Q23 Q24A Q24B   Q24C Q24D Q24E Q24F Q24G Q24H Q24I Q25 Q26A Q26B
        Q26C Q26D Q26E Q26F   Q26G Q26H Q26I Q26J Q26K Q27 Q28 Q30 Q31 Q32A
        Q32B Q32C Q32D Q32E   Q32F;
        run;

Followed by filtering out the means, std. deviation and no. of Obs. information to obtain a 'true' correlation matrix.

      data survey.corrMatrix2;
      set survey.pchor_dm;
      where _type_='CORR';
      drop _type_ _name_;
      run;

I then attempted to run a Principal Axis Factoring and initially got the errors:

Correlation matrix is Singular

Communality greater than 1.0

title1 'Factor Analysis of Customer Satisfaction Survey 2016';
title2 'PAF method with Polychoric Correlation Coefficients';
ods graphics on;
proc factor data=survey.corrMatrix
method=prinit
priors=smc
plot=scree heywood rotate=promax /* promax can be tried too */
;
var
Q1   Q3   Q4A Q4B Q4C Q4D Q4E Q4F Q4G Q4H Q4I Q4J Q5A Q5B
Q5C Q5D Q5E Q5F Q5G Q5H Q6A Q6B Q6C Q6D Q6E Q6F Q7   Q10A
Q10B Q10C Q10D Q10E   Q10F Q10G Q11A Q11B Q11C Q11D Q11E Q11F Q11G Q13
Q15A Q15B Q15C Q15D Q16A Q16B Q16C Q17 Q18A Q18B Q18C Q18D Q19A Q19B
Q19C Q20A Q20B Q20C   Q21A Q21B Q21C Q21D Q21E Q21F Q21G Q21H Q21I Q21J
Q22   Q23 Q24A Q24B   Q24C Q24D Q24E Q24F Q24G Q24H Q24I Q25 Q26A Q26B
Q26C Q26D Q26E Q26F   Q26G Q26H Q26I Q26J Q26K Q27 Q28 Q30 Q31 Q32A
Q32B Q32C Q32D Q32E   Q32F;
run;
ods graphics off;

I then used the option HEYWOOD option and get the followign errors:

WARNING: The number of observations is not greater than the number of variables.
ERROR: Correlation matrix is singular.
NOTE: Prior communality estimates will be 1.0.
NOTE: 102 factors will be retained by the MINEIGEN criterion.
WARNING: Too many factors for a unique solution.
NOTE: Convergence criterion satisfied.

Could someone please help me with what I'm doign wrong? I really need to get this done today

Regards

Mari

Ksharp · Posted 02-21-2017 08:59 PM

No. You have ordinal and nominal variables, not continuous variables.Therefore you can not use PROC CLUSTER or a principal component analysis directly. Best choice is PROC PRINQUAL .

View solution in original post

Rick_SAS · Posted 02-16-2017 12:25 PM

Since you are in a hurry, I will say some things without completely checking them. Otherwise I would not be able to post until tomorrow.

I think the problem is the DATA step in which you are extracting _TYPE_+"CORR". You do NOT want to modify the matrix that is produced by PROC CORR. PROC FACTOR expects to receive a TYPE=CORR data set, and it uses the _TYPE_ variable to reconstruct the important statistics that it needs. I think that what is happening is that PROC FACTOR is interpreting the data as being raw data, rather than a pre-computed correlation matrix.

mszommer · Posted 02-16-2017 12:57 PM

Thank you for the prompt response.

I did run the PROC Factor with the matrix generated by Proc Corr and get the followign error:
ERROR: Correlation matrix is singular.
NOTE: Prior communality estimates will be 1.0.
NOTE: 93 factors will be retained by the PROPORTION criterion.
WARNING: Too many factors for a unique solution.
ERROR: Maximum iterations exceeded

Is the correlation matrix is singular due to too many variables ans too few cases of data (I had 53000+ observations)? Or is it because I have too many highly correlated items in my matrix?

Could you please help (further)? If I cannt generate sensible results today then it makes sense to ask for a days time more.

Regards

Mari

Rick_SAS · Posted 02-16-2017 01:42 PM

With 53K observations, I wouldn't expect collinearities in 103 ordinal variables. Are you using dummy variables instead of ordinal variable for the data? For example, if a variable X has values 1, 2 and 3, you will get a singular matrix if you replace X with three dummy variables X1=(X=1), X2=(X=2), and X3=(X=3).

Make sure in the PROC CORR that you are using the original ordinal variable, where each variable corresponds to one question.

mszommer · Posted 02-16-2017 02:23 PM

No, I'm not. Each ordinal variable corresponds to one question. I used 1=yes and 0=no for multi-response (nominal) questions. So, if Q5 had 5 options and a respondent could select more than 1 option, I coded it 5 questions, namely Q5a, Q5b, Q5c, Q5d and Q5e

Rick_SAS · Posted 02-16-2017 03:36 PM

My guess is that you have a polychoric matrix that is not positive definite. This can happen for various reasons, including the presence of missing values. If you have missing values, you could try adding the NOMISS option to the PROC CORR statement (as discussed in the article) to perform listwise deletion of missing values.

It could also indicate collinearity. The REG procedure can check for collinearity among the Q variables. You need to "invent" response variable and then use the COLLIN option on the MODEL statement, as follows:

data Check;
set Have;
Y = rand("Normal");
run;

proc reg data=Check plots=none;
model Y = Q1-Q3 / COLLIN;  /* <= put all "Q" variables here */
run;

Any variable that gets 0 for a parameter estimate is collinear with others. You will also get a NOTE such as

Note:

The following parameters have been set to 0, since the variables are a linear combination of other variables as shown.

Q3 =	q1 - q2

mszommer · Posted 02-17-2017 04:51 AM

Good Morning, Rick

I have no missing observations in my data. I did use PROC REG with the option COLLIN and of the 103 Variables, only 8 Variables did not have 0 as their parameter estimates.

Is it possible that 95 Variables are a linear combination of these (mere) 8 variables? Would you recommend continuing only with the 8 Variables? Would I not be loosing information pertaining to the other (maybe not all the 95) variables?

Thanks & Regards

Mari

Rick_SAS · Posted 02-17-2017 05:38 AM

mszommer wrote:
I have no missing observations in my data. I did use PROC REG with the option COLLIN and of the 103 Variables, only 8 Variables did not have 0 as their parameter estimates.

Is it possible that 95 Variables are a linear combination of these (mere) 8 variables? Would you recommend continuing only with the 8 Variables? Would I not be loosing information pertaining to the other (maybe not all the 95) variables?

You ask whether it is possible that Is 95 variables are linear combinations of 8. Assuming that you ran PROC REG correctly, that is indeed what your results are saying. At this point, you need to look at the data to find out why there is not much valid information. Are most columns entirely zero? Entirely 1? Use PROC FREQ to run an analysis to determine the distribution of the variables.

There's not much else that I can suggest if your data are degenerate. Review the way the data were gathered and onsult with a knowledgeable colleague/statisticians who can help you discover the problem.

Good luck!

Ksharp · Posted 02-16-2017 09:51 PM

1) Try add one option to make it as corr dataset. and no need VAR statement.

 data survey.corrMatrix(type=corr);
      set survey.pchor_dm;
      where _type_='CORR';
      drop _type_ _name_;
      run;

proc factor data=survey.corrMatrix;
run;


2)you can get the nearest correlation matrix of it and feed to proc factor.

http://blogs.sas.com/content/iml/2012/11/28/computing-the-nearest-correlation-matrix.html


3) They are ordinal variable,so I guess you can't use it in proc factor.
Maybe you could do it in PCA for qualtative data and get CORR matrix .

PROC PRINQUAL COR ;

mszommer · Posted 02-17-2017 04:55 AM

Thank you, KSharp.

Could you help me with the syntax to generate the nearest correlation matrix?

/* symmetric matrix, but not positive definite */
A = {1.0  0.99 0.35, 
     0.99 1.0  0.80, 
     0.35 0.80 1.0} ;
B = NearestCorr(A);
print B;

As per the link that you quoted, my A would be survey.corrMatrix and B the NearestCorr(survey.corrMatrix), is it? Sorry, that I do not get this part.

Regards

Mari

Ksharp · Posted 02-17-2017 10:38 PM

Rick's blog has already offer the IML code to get nearest correlation matrix.Just follow it step by step.

As I said before, your data is not continuous ,while is ordinal or nominal value,
therefore you can not directly use  PROC FACTOR. Your best choice is PROC PRINQUAL.
Especial the second example in its documentation.

proc prinqual data=bball out=tbball scores n=1 tstandard=z
plots=transformations;
title2 'Optimal Monotonic Transformation of Ranked Teams';
title3 'with Constrained Estimation of Unranked Teams';
transform untie(CSN -- SportsIllustrated);
id School;
run;
plots=transformations;
title2 'Optimal Monotonic Transformation of Ranked Teams';
title3 'with Constrained Estimation of Unranked Teams';
transform untie(CSN -- SportsIllustrated);
id School;
run;
* Perform the Final Principal Component Analysis;
proc factor nfactors=1 plots=scree;
title4 'Principal Component Analysis';
ods select factorpattern screeplot;
var TCSN -- TSportsIllustrated;
run;

Ksharp · Posted 02-19-2017 02:06 AM

Here I quoted from the Example 2 of PROC PRINQUAL.


An alternative approach is to use the pairwise deletion option of the CORR procedure to compute the
correlation matrix and then use PROC PRINCOMP or PROC FACTOR to perform the principal component
analysis. This approach has several disadvantages. The correlation matrix might not be positive semidefinite
(PSD), an assumption required for principal component analysis. PROC PRINQUAL always produces a PSD
correlation matrix. Even with pairwise deletion, PROC CORR removes the six observations that have only a
single nonmissing value from this data set. Finally, it is still not possible to calculate scores on the principal
components for those teams that have missing values.

mszommer · Posted 02-20-2017 08:17 AM

Hello KSharp,
Thank you for your replies.

I was attempting to run the PROC PRINQUAL and it runs forever.
I then mentioned the nominal and ordinal variables as opscore and monotone (without the untie option, as I do not have any missing values.):

proc prinqual data=survey.mydata out=survey.prinq scores n=1 tstandard=z
plots=transformations;
transform opscore(Q1   Q3   Q4A Q4B Q4C Q4D Q4E Q4F Q4G
Q4H Q4I Q4J Q5A Q5B Q5C Q5D Q5E Q5F Q5G Q5H Q7 Q13
Q22   Q23 Q30 Q31 Q32A
Q32B Q32C Q32D Q32E   Q32F)
monotone(Q6A Q6B Q6C Q6D Q6E Q6F Q10A
Q10B Q10C Q10D Q10E   Q10F Q10G Q11A Q11B Q11C Q11D Q11E Q11F Q11G Q15A
Q15B Q15C Q15D Q16A Q16B Q16C Q17 Q18A Q18B Q18C Q18D Q19A Q19B
Q19C Q20A Q20B Q20C   Q21A Q21B Q21C Q21D Q21E Q21F Q21G Q21H Q21I Q21J
Q24A Q24B Q24C Q24D Q24E Q24F Q24G Q24H Q24I Q25 Q26A Q26B
Q26C Q26D Q26E Q26F   Q26G Q26H Q26I Q26J Q26K Q27 Q28);
run;

and got a note 'Algorithm converged'. Could you tell me what it means?

Regards
Mari

Ksharp · Posted 02-20-2017 09:34 PM

It means the result looks real good.
Could you try a small data and variables to see if you could get the result.

Ksharp · Posted 02-20-2017 09:44 PM

And try the code as simple as 

ods select none;
proc prinqual data=survey.mydata out=survey.prinq;
transform ...........
run;

ods select all;
proc factor.........