BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mszommer
Obsidian | Level 7

Hello,

 

I resorted to polychoric correlation matrix as my variables are all either scale-based (likert-scaled) or dichotonomous.I have 103 variables in total

 

I used the OUTPLC= option

        proc corr data=survey.mydata outplc=survey.pchor_dm_new;
        var
        Q1   Q3   Q4A  Q4B  Q4C  Q4D  Q4E  Q4F  Q4G  Q4H  Q4I  Q4J  Q5A  Q5B
        Q5C  Q5D  Q5E  Q5F  Q5G  Q5H  Q6A  Q6B  Q6C  Q6D  Q6E  Q6F  Q7   Q10A    
        Q10B Q10C Q10D Q10E    Q10F Q10G Q11A Q11B Q11C Q11D Q11E Q11F Q11G Q13  
        Q15A Q15B Q15C Q15D Q16A Q16B Q16C Q17  Q18A Q18B Q18C Q18D Q19A Q19B    
        Q19C Q20A Q20B Q20C    Q21A Q21B Q21C Q21D Q21E Q21F Q21G Q21H Q21I Q21J    
        Q22     Q23  Q24A Q24B    Q24C Q24D Q24E Q24F Q24G Q24H Q24I Q25  Q26A Q26B    
        Q26C Q26D Q26E Q26F    Q26G Q26H Q26I Q26J Q26K Q27  Q28  Q30  Q31  Q32A    
        Q32B Q32C Q32D Q32E    Q32F;
        run;

 

Followed by filtering out the means, std. deviation and no. of Obs. information to obtain a 'true' correlation matrix.

      data survey.corrMatrix2;
      set survey.pchor_dm;
      where _type_='CORR';
      drop _type_ _name_;
      run;

 

I then attempted to run a Principal Axis Factoring and initially got the errors:

Correlation matrix is Singular

Communality greater than 1.0

title1 'Factor Analysis of Customer Satisfaction Survey 2016';
title2 'PAF method with Polychoric Correlation Coefficients';
ods graphics on;
proc factor data=survey.corrMatrix
method=prinit
priors=smc
plot=scree heywood rotate=promax /* promax can be tried too */
;
var
Q1   Q3   Q4A  Q4B  Q4C  Q4D  Q4E  Q4F  Q4G  Q4H  Q4I  Q4J  Q5A  Q5B
Q5C  Q5D  Q5E  Q5F  Q5G  Q5H  Q6A  Q6B  Q6C  Q6D  Q6E  Q6F  Q7   Q10A    
Q10B Q10C Q10D Q10E    Q10F Q10G Q11A Q11B Q11C Q11D Q11E Q11F Q11G Q13  
Q15A Q15B Q15C Q15D Q16A Q16B Q16C Q17  Q18A Q18B Q18C Q18D Q19A Q19B    
Q19C Q20A Q20B Q20C    Q21A Q21B Q21C Q21D Q21E Q21F Q21G Q21H Q21I Q21J    
Q22     Q23  Q24A Q24B    Q24C Q24D Q24E Q24F Q24G Q24H Q24I Q25  Q26A Q26B    
Q26C Q26D Q26E Q26F    Q26G Q26H Q26I Q26J Q26K Q27  Q28  Q30  Q31  Q32A    
Q32B Q32C Q32D Q32E    Q32F;
run;
ods graphics off;

 

I then used the option HEYWOOD option and get the followign errors:

WARNING: The number of observations is not greater than the number of variables.
ERROR: Correlation matrix is singular.
NOTE: Prior communality estimates will be 1.0.
NOTE: 102 factors will be retained by the MINEIGEN criterion.
WARNING: Too many factors for a unique solution.
NOTE: Convergence criterion satisfied.

 

Could someone please help me with what I'm doign wrong? I really need to get this done today

 

Regards

Mari

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

No. You have ordinal and nominal variables, not continuous variables.Therefore you can not use PROC CLUSTER or a principal component analysis directly. Best choice is PROC PRINQUAL .

View solution in original post

22 REPLIES 22
Rick_SAS
SAS Super FREQ

Since you are in a hurry, I will say some things without completely checking them. Otherwise I would not be able to post until tomorrow. 

 

I think the problem is the DATA step in which you are extracting _TYPE_+"CORR". You do NOT want to modify the matrix that is produced by PROC CORR.  PROC FACTOR expects to receive a TYPE=CORR data set, and it uses the _TYPE_ variable to reconstruct the important statistics that it needs. I think that what is happening is that PROC FACTOR is interpreting the data as being raw data, rather than a pre-computed correlation matrix.

mszommer
Obsidian | Level 7

Thank you for the prompt response.

 

I did run the PROC Factor with the matrix generated by Proc Corr and get the followign error:
ERROR: Correlation matrix is singular.
NOTE: Prior communality estimates will be 1.0.
NOTE: 93 factors will be retained by the PROPORTION criterion.
WARNING: Too many factors for a unique solution.
ERROR: Maximum iterations exceeded

 

Is the correlation matrix is singular due to too many variables ans too few cases of data (I had 53000+ observations)? Or is it because I have too many highly correlated items in my matrix?

 

Could you please help (further)? If I cannt generate sensible results today then it makes sense to ask for a days time more.

 

Regards

Mari

Rick_SAS
SAS Super FREQ

With 53K observations, I wouldn't expect collinearities in 103 ordinal variables. Are you using dummy variables instead of ordinal variable for the data? For example, if a variable X has values 1, 2 and 3, you will get a singular matrix if you replace X with three dummy variables X1=(X=1), X2=(X=2), and X3=(X=3).  

 

Make sure in the PROC CORR that you are using the original ordinal variable, where each variable corresponds to one question.

mszommer
Obsidian | Level 7
No, I'm not. Each ordinal variable corresponds to one question. I used 1=yes and 0=no for multi-response (nominal) questions. So, if Q5 had 5 options and a respondent could select more than 1 option, I coded it 5 questions, namely Q5a, Q5b, Q5c, Q5d and Q5e
Rick_SAS
SAS Super FREQ

My guess is that you have a polychoric matrix that is not positive definite. This can happen for various reasons, including the presence of missing values. If you have missing values, you could try adding the NOMISS option to the PROC CORR statement (as discussed in the article) to perform listwise deletion of missing values.

 

It could also indicate collinearity. The REG procedure can check for collinearity among the Q variables.  You need to "invent" response variable and then use the COLLIN option on the MODEL statement, as follows:

 

data Check;
set Have;
Y = rand("Normal");
run;

proc reg data=Check plots=none;
model Y = Q1-Q3 / COLLIN;  /* <= put all "Q" variables here */
run;

Any variable that gets 0 for a parameter estimate is collinear with others. You will also get a NOTE such as

 

Note: The following parameters have been set to 0, since the variables are a linear combination of other variables as shown.

Q3 = q1 - q2
mszommer
Obsidian | Level 7

Good Morning, Rick

 

I have no missing observations in my data. I did use PROC REG with the option COLLIN and of the 103 Variables, only 8 Variables did not have 0 as their parameter estimates.

 

Is it possible that 95 Variables are a linear combination of these (mere) 8 variables? Would you recommend continuing only with the 8 Variables? Would I not be loosing information pertaining to the other (maybe not all the 95) variables?

 

Thanks & Regards

Mari

Rick_SAS
SAS Super FREQ

 


mszommer wrote:

I have no missing observations in my data. I did use PROC REG with the option COLLIN and of the 103 Variables, only 8 Variables did not have 0 as their parameter estimates.

 

Is it possible that 95 Variables are a linear combination of these (mere) 8 variables? Would you recommend continuing only with the 8 Variables? Would I not be loosing information pertaining to the other (maybe not all the 95) variables?

 

You ask whether it is possible that Is 95 variables are linear combinations of 8.  Assuming that you ran PROC REG correctly, that is indeed what your results are saying. At this point, you need to look at the data to find out why there is not  much valid information. Are most columns entirely zero?  Entirely 1? Use PROC FREQ to  run an analysis to determine the distribution of the variables. 

 

There's not much else that I can  suggest if your data are degenerate. Review the  way the data were gathered and onsult with a knowledgeable colleague/statisticians who can help you discover the problem.  

 

Good luck!

Ksharp
Super User
1) Try add one option to make it as corr dataset. and no need VAR statement.

 data survey.corrMatrix(type=corr);
      set survey.pchor_dm;
      where _type_='CORR';
      drop _type_ _name_;
      run;

proc factor data=survey.corrMatrix;
run;


2)you can get the nearest correlation matrix of it and feed to proc factor.

http://blogs.sas.com/content/iml/2012/11/28/computing-the-nearest-correlation-matrix.html


3) They are ordinal variable,so I guess you can't use it in proc factor.
Maybe you could do it in PCA for qualtative data and get CORR matrix .

PROC PRINQUAL COR ;
mszommer
Obsidian | Level 7

Thank you, KSharp.

 

Could you help me with the syntax to generate the nearest correlation matrix?

/* symmetric matrix, but not positive definite */
A = {1.0  0.99 0.35, 
     0.99 1.0  0.80, 
     0.35 0.80 1.0} ;
B = NearestCorr(A);
print B;

As per the link that you quoted, my A would be survey.corrMatrix and B the NearestCorr(survey.corrMatrix), is it? Sorry, that I do not get this part.

 

Regards

Mari

Ksharp
Super User
Rick's blog has already offer the IML code to get nearest correlation matrix.Just follow it step by step.

As I said before, your data is not continuous ,while is ordinal or nominal value,
therefore you can not directly use  PROC FACTOR. Your best choice is PROC PRINQUAL.
Especial the second example in its documentation.

proc prinqual data=bball out=tbball scores n=1 tstandard=z
plots=transformations;
title2 'Optimal Monotonic Transformation of Ranked Teams';
title3 'with Constrained Estimation of Unranked Teams';
transform untie(CSN -- SportsIllustrated);
id School;
run;
plots=transformations;
title2 'Optimal Monotonic Transformation of Ranked Teams';
title3 'with Constrained Estimation of Unranked Teams';
transform untie(CSN -- SportsIllustrated);
id School;
run;
* Perform the Final Principal Component Analysis;
proc factor nfactors=1 plots=scree;
title4 'Principal Component Analysis';
ods select factorpattern screeplot;
var TCSN -- TSportsIllustrated;
run;



Ksharp
Super User
Here I quoted from the Example 2 of PROC PRINQUAL.


An alternative approach is to use the pairwise deletion option of the CORR procedure to compute the
correlation matrix and then use PROC PRINCOMP or PROC FACTOR to perform the principal component
analysis. This approach has several disadvantages. The correlation matrix might not be positive semidefinite
(PSD), an assumption required for principal component analysis. PROC PRINQUAL always produces a PSD
correlation matrix. Even with pairwise deletion, PROC CORR removes the six observations that have only a
single nonmissing value from this data set. Finally, it is still not possible to calculate scores on the principal
components for those teams that have missing values.


mszommer
Obsidian | Level 7

Hello KSharp,
Thank you for your replies.

I was attempting to run the PROC PRINQUAL and it runs forever.
I then mentioned the nominal and ordinal variables as opscore and monotone (without the untie option, as I do not have any missing values.):

 

proc prinqual data=survey.mydata out=survey.prinq scores n=1 tstandard=z
plots=transformations;
transform opscore(Q1   Q3   Q4A  Q4B  Q4C  Q4D  Q4E  Q4F  Q4G  
Q4H  Q4I  Q4J  Q5A  Q5B Q5C  Q5D  Q5E  Q5F  Q5G  Q5H  Q7  Q13  
Q22     Q23  Q30  Q31  Q32A    
Q32B Q32C Q32D Q32E    Q32F)
monotone(Q6A  Q6B  Q6C  Q6D  Q6E  Q6F  Q10A    
Q10B Q10C Q10D Q10E    Q10F Q10G Q11A Q11B Q11C Q11D Q11E Q11F Q11G Q15A
Q15B Q15C Q15D Q16A Q16B Q16C Q17  Q18A Q18B Q18C Q18D Q19A Q19B    
Q19C Q20A Q20B Q20C    Q21A Q21B Q21C Q21D Q21E Q21F Q21G Q21H Q21I Q21J
Q24A Q24B Q24C Q24D Q24E Q24F Q24G Q24H Q24I Q25  Q26A Q26B    
Q26C Q26D Q26E Q26F    Q26G Q26H Q26I Q26J Q26K Q27  Q28);
run;

 

and got a note 'Algorithm converged'. Could you tell me what it means?


Regards
Mari

Ksharp
Super User
It means the result looks real good.
Could you try a small data and variables to see if you could get the result.

Ksharp
Super User
And try the code as simple as 

ods select none;
proc prinqual data=survey.mydata out=survey.prinq;
transform ...........
run;

ods select all;
proc factor.........



sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 22 replies
  • 5021 views
  • 0 likes
  • 3 in conversation