turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- proc factor with polychoric correlation matrix: yi...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-16-2017 11:22 AM - edited 02-21-2017 10:19 AM

Hello,

I resorted to polychoric correlation matrix as my variables are all either scale-based (likert-scaled) or dichotonomous.I have 103 variables in total

I used the OUTPLC= option

proc corr data=survey.mydata outplc=survey.pchor_dm_new;

var

Q1 Q3 Q4A Q4B Q4C Q4D Q4E Q4F Q4G Q4H Q4I Q4J Q5A Q5B

Q5C Q5D Q5E Q5F Q5G Q5H Q6A Q6B Q6C Q6D Q6E Q6F Q7 Q10A

Q10B Q10C Q10D Q10E Q10F Q10G Q11A Q11B Q11C Q11D Q11E Q11F Q11G Q13

Q15A Q15B Q15C Q15D Q16A Q16B Q16C Q17 Q18A Q18B Q18C Q18D Q19A Q19B

Q19C Q20A Q20B Q20C Q21A Q21B Q21C Q21D Q21E Q21F Q21G Q21H Q21I Q21J

Q22 Q23 Q24A Q24B Q24C Q24D Q24E Q24F Q24G Q24H Q24I Q25 Q26A Q26B

Q26C Q26D Q26E Q26F Q26G Q26H Q26I Q26J Q26K Q27 Q28 Q30 Q31 Q32A

Q32B Q32C Q32D Q32E Q32F;

run;

Followed by filtering out the means, std. deviation and no. of Obs. information to obtain a 'true' correlation matrix.

data survey.corrMatrix2;

set survey.pchor_dm;

where _type_='CORR';

drop _type_ _name_;

run;

I then attempted to run a Principal Axis Factoring and initially got the errors:

*Correlation matrix is Singular*

*Communality greater than 1.0*

title1 'Factor Analysis of Customer Satisfaction Survey 2016';

title2 'PAF method with Polychoric Correlation Coefficients';

ods graphics on;

proc factor data=survey.corrMatrix

method=prinit

priors=smc

plot=scree heywood rotate=promax /* promax can be tried too */

;

var

Q1 Q3 Q4A Q4B Q4C Q4D Q4E Q4F Q4G Q4H Q4I Q4J Q5A Q5B

Q5C Q5D Q5E Q5F Q5G Q5H Q6A Q6B Q6C Q6D Q6E Q6F Q7 Q10A

Q10B Q10C Q10D Q10E Q10F Q10G Q11A Q11B Q11C Q11D Q11E Q11F Q11G Q13

Q15A Q15B Q15C Q15D Q16A Q16B Q16C Q17 Q18A Q18B Q18C Q18D Q19A Q19B

Q19C Q20A Q20B Q20C Q21A Q21B Q21C Q21D Q21E Q21F Q21G Q21H Q21I Q21J

Q22 Q23 Q24A Q24B Q24C Q24D Q24E Q24F Q24G Q24H Q24I Q25 Q26A Q26B

Q26C Q26D Q26E Q26F Q26G Q26H Q26I Q26J Q26K Q27 Q28 Q30 Q31 Q32A

Q32B Q32C Q32D Q32E Q32F;

run;

ods graphics off;

I then used the option HEYWOOD option and get the followign errors:

WARNING: The number of observations is not greater than the number of variables.

ERROR: Correlation matrix is singular.

NOTE: Prior communality estimates will be 1.0.

NOTE: 102 factors will be retained by the MINEIGEN criterion.

WARNING: Too many factors for a unique solution.

NOTE: Convergence criterion satisfied.

Could someone please help me with what I'm doign wrong? I really need to get this done today

Regards

Mari

Accepted Solutions

Solution

02-22-2017
04:48 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-21-2017 08:59 PM

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-16-2017 12:25 PM

Since you are in a hurry, I will say some things without completely checking them. Otherwise I would not be able to post until tomorrow.

I think the problem is the DATA step in which you are extracting _TYPE_+"CORR". You do NOT want to modify the matrix that is produced by PROC CORR. PROC FACTOR expects to receive a TYPE=CORR data set, and it uses the _TYPE_ variable to reconstruct the important statistics that it needs. I think that what is happening is that PROC FACTOR is interpreting the data as being raw data, rather than a pre-computed correlation matrix.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-16-2017 12:57 PM

Thank you for the prompt response.

I did run the PROC Factor with the matrix generated by Proc Corr and get the followign error:

ERROR: Correlation matrix is singular.

NOTE: Prior communality estimates will be 1.0.

NOTE: 93 factors will be retained by the PROPORTION criterion.

WARNING: Too many factors for a unique solution.

ERROR: Maximum iterations exceeded

Is the correlation matrix is singular due to too many variables ans too few cases of data (I had 53000+ observations)? Or is it because I have too many highly correlated items in my matrix?

Could you please help (further)? If I cannt generate sensible results today then it makes sense to ask for a days time more.

Regards

Mari

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-16-2017 01:42 PM

With 53K observations, I wouldn't expect collinearities in 103 ordinal variables. Are you using dummy variables instead of ordinal variable for the data? For example, if a variable X has values 1, 2 and 3, you will get a singular matrix if you replace X with three dummy variables X1=(X=1), X2=(X=2), and X3=(X=3).

Make sure in the PROC CORR that you are using the original ordinal variable, where each variable corresponds to one question.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-16-2017 02:23 PM

No, I'm not. Each ordinal variable corresponds to one question. I used 1=yes and 0=no for multi-response (nominal) questions. So, if Q5 had 5 options and a respondent could select more than 1 option, I coded it 5 questions, namely Q5a, Q5b, Q5c, Q5d and Q5e

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-16-2017 03:36 PM

My guess is that you have a polychoric matrix that is not positive definite. This can happen for various reasons, including the presence of missing values. If you have missing values, you could try adding the NOMISS option to the PROC CORR statement (as discussed in the article) to perform listwise deletion of missing values.

It could also indicate collinearity. The REG procedure can check for collinearity among the Q variables. You need to "invent" response variable and then use the COLLIN option on the MODEL statement, as follows:

```
data Check;
set Have;
Y = rand("Normal");
run;
proc reg data=Check plots=none;
model Y = Q1-Q3 / COLLIN; /* <= put all "Q" variables here */
run;
```

Any variable that gets 0 for a parameter estimate is collinear with others. You will also get a NOTE such as

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-17-2017 04:51 AM

Good Morning, Rick

I have no missing observations in my data. I did use PROC REG with the option COLLIN and of the 103 Variables, only 8 Variables did not have 0 as their parameter estimates.

Is it possible that 95 Variables are a linear combination of these (mere) 8 variables? Would you recommend continuing only with the 8 Variables? Would I not be loosing information pertaining to the other (maybe not all the 95) variables?

Thanks & Regards

Mari

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-17-2017 05:38 AM

mszommer wrote:I have no missing observations in my data. I did use PROC REG with the option COLLIN and of the 103 Variables, only 8 Variables did not have 0 as their parameter estimates.

Is it possible that 95 Variables are a linear combination of these (mere) 8 variables? Would you recommend continuing only with the 8 Variables? Would I not be loosing information pertaining to the other (maybe not all the 95) variables?

You ask whether it is possible that Is 95 variables are linear combinations of 8. Assuming that you ran PROC REG correctly, that is indeed what your results are saying. At this point, you need to look at the data to find out why there is not much valid information. Are most columns entirely zero? Entirely 1? Use PROC FREQ to run an analysis to determine the distribution of the variables.

There's not much else that I can suggest if your data are degenerate. Review the way the data were gathered and onsult with a knowledgeable colleague/statisticians who can help you discover the problem.

Good luck!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-16-2017 09:51 PM - edited 02-16-2017 10:13 PM

1) Try add one option to make it as corr dataset. and no need VAR statement. data survey.corrMatrix(type=corr); set survey.pchor_dm; where _type_='CORR'; drop _type_ _name_; run; proc factor data=survey.corrMatrix; run; 2)you can get the nearest correlation matrix of it and feed to proc factor. http://blogs.sas.com/content/iml/2012/11/28/computing-the-nearest-correlation-matrix.html 3) They are ordinal variable,so I guess you can't use it in proc factor. Maybe you could do it in PCA for qualtative data and get CORR matrix . PROC PRINQUAL COR ;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-17-2017 04:55 AM

Thank you, KSharp.

Could you help me with the syntax to generate the nearest correlation matrix?

/* symmetric matrix, but not positive definite */ A = {1.0 0.99 0.35, 0.99 1.0 0.80, 0.35 0.80 1.0} ; B = NearestCorr(A); print B;

As per the link that you quoted, my A would be survey.corrMatrix and B the NearestCorr(survey.corrMatrix), is it? Sorry, that I do not get this part.

Regards

Mari

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-17-2017 10:38 PM - edited 02-17-2017 10:39 PM

Rick's blog has already offer the IML code to get nearest correlation matrix.Just follow it step by step. As I said before, your data is not continuous ,while is ordinal or nominal value, therefore you can not directly use PROC FACTOR. Your best choice is PROC PRINQUAL. Especial the second example in its documentation. proc prinqual data=bball out=tbball scores n=1 tstandard=z plots=transformations; title2 'Optimal Monotonic Transformation of Ranked Teams'; title3 'with Constrained Estimation of Unranked Teams'; transform untie(CSN -- SportsIllustrated); id School; run; plots=transformations; title2 'Optimal Monotonic Transformation of Ranked Teams'; title3 'with Constrained Estimation of Unranked Teams'; transform untie(CSN -- SportsIllustrated); id School; run; * Perform the Final Principal Component Analysis; proc factor nfactors=1 plots=scree; title4 'Principal Component Analysis'; ods select factorpattern screeplot; var TCSN -- TSportsIllustrated; run;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-19-2017 02:06 AM

Here I quoted from the Example 2 of PROC PRINQUAL. An alternative approach is to use the pairwise deletion option of the CORR procedure to compute the correlation matrix and then use PROC PRINCOMP or PROC FACTOR to perform the principal component analysis. This approach has several disadvantages. The correlation matrix might not be positive semidefinite (PSD), an assumption required for principal component analysis. PROC PRINQUAL always produces a PSD correlation matrix. Even with pairwise deletion, PROC CORR removes the six observations that have only a single nonmissing value from this data set. Finally, it is still not possible to calculate scores on the principal components for those teams that have missing values.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-20-2017 08:17 AM - edited 02-21-2017 10:21 AM

Hello KSharp,

Thank you for your replies.

I was attempting to run the PROC PRINQUAL and it runs forever.

I then mentioned the nominal and ordinal variables as opscore and monotone (without the untie option, as I do not have any missing values.):

proc prinqual data=survey.mydata out=survey.prinq scores n=1 tstandard=z

plots=transformations;

transform opscore(Q1 Q3 Q4A Q4B Q4C Q4D Q4E Q4F Q4G

Q4H Q4I Q4J Q5A Q5B Q5C Q5D Q5E Q5F Q5G Q5H Q7 Q13

Q22 Q23 Q30 Q31 Q32A

Q32B Q32C Q32D Q32E Q32F)

monotone(Q6A Q6B Q6C Q6D Q6E Q6F Q10A

Q10B Q10C Q10D Q10E Q10F Q10G Q11A Q11B Q11C Q11D Q11E Q11F Q11G Q15A

Q15B Q15C Q15D Q16A Q16B Q16C Q17 Q18A Q18B Q18C Q18D Q19A Q19B

Q19C Q20A Q20B Q20C Q21A Q21B Q21C Q21D Q21E Q21F Q21G Q21H Q21I Q21J

Q24A Q24B Q24C Q24D Q24E Q24F Q24G Q24H Q24I Q25 Q26A Q26B

Q26C Q26D Q26E Q26F Q26G Q26H Q26I Q26J Q26K Q27 Q28);

run;

and got a note 'Algorithm converged'. Could you tell me what it means?

Regards

Mari

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-20-2017 09:34 PM

It means the result looks real good. Could you try a small data and variables to see if you could get the result.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-20-2017 09:44 PM

And try the code as simple as ods select none; proc prinqual data=survey.mydata out=survey.prinq; transform ........... run; ods select all; proc factor.........