Hello,
I resorted to polychoric correlation matrix as my variables are all either scale-based (likert-scaled) or dichotonomous.I have 103 variables in total
I used the OUTPLC= option
proc corr data=survey.mydata outplc=survey.pchor_dm_new;
var
Q1 Q3 Q4A Q4B Q4C Q4D Q4E Q4F Q4G Q4H Q4I Q4J Q5A Q5B
Q5C Q5D Q5E Q5F Q5G Q5H Q6A Q6B Q6C Q6D Q6E Q6F Q7 Q10A
Q10B Q10C Q10D Q10E Q10F Q10G Q11A Q11B Q11C Q11D Q11E Q11F Q11G Q13
Q15A Q15B Q15C Q15D Q16A Q16B Q16C Q17 Q18A Q18B Q18C Q18D Q19A Q19B
Q19C Q20A Q20B Q20C Q21A Q21B Q21C Q21D Q21E Q21F Q21G Q21H Q21I Q21J
Q22 Q23 Q24A Q24B Q24C Q24D Q24E Q24F Q24G Q24H Q24I Q25 Q26A Q26B
Q26C Q26D Q26E Q26F Q26G Q26H Q26I Q26J Q26K Q27 Q28 Q30 Q31 Q32A
Q32B Q32C Q32D Q32E Q32F;
run;
Followed by filtering out the means, std. deviation and no. of Obs. information to obtain a 'true' correlation matrix.
data survey.corrMatrix2;
set survey.pchor_dm;
where _type_='CORR';
drop _type_ _name_;
run;
I then attempted to run a Principal Axis Factoring and initially got the errors:
Correlation matrix is Singular
Communality greater than 1.0
title1 'Factor Analysis of Customer Satisfaction Survey 2016';
title2 'PAF method with Polychoric Correlation Coefficients';
ods graphics on;
proc factor data=survey.corrMatrix
method=prinit
priors=smc
plot=scree heywood rotate=promax /* promax can be tried too */
;
var
Q1 Q3 Q4A Q4B Q4C Q4D Q4E Q4F Q4G Q4H Q4I Q4J Q5A Q5B
Q5C Q5D Q5E Q5F Q5G Q5H Q6A Q6B Q6C Q6D Q6E Q6F Q7 Q10A
Q10B Q10C Q10D Q10E Q10F Q10G Q11A Q11B Q11C Q11D Q11E Q11F Q11G Q13
Q15A Q15B Q15C Q15D Q16A Q16B Q16C Q17 Q18A Q18B Q18C Q18D Q19A Q19B
Q19C Q20A Q20B Q20C Q21A Q21B Q21C Q21D Q21E Q21F Q21G Q21H Q21I Q21J
Q22 Q23 Q24A Q24B Q24C Q24D Q24E Q24F Q24G Q24H Q24I Q25 Q26A Q26B
Q26C Q26D Q26E Q26F Q26G Q26H Q26I Q26J Q26K Q27 Q28 Q30 Q31 Q32A
Q32B Q32C Q32D Q32E Q32F;
run;
ods graphics off;
I then used the option HEYWOOD option and get the followign errors:
WARNING: The number of observations is not greater than the number of variables.
ERROR: Correlation matrix is singular.
NOTE: Prior communality estimates will be 1.0.
NOTE: 102 factors will be retained by the MINEIGEN criterion.
WARNING: Too many factors for a unique solution.
NOTE: Convergence criterion satisfied.
Could someone please help me with what I'm doign wrong? I really need to get this done today
Regards
Mari
No. You have ordinal and nominal variables, not continuous variables.Therefore you can not use PROC CLUSTER or a principal component analysis directly. Best choice is PROC PRINQUAL .
Since you are in a hurry, I will say some things without completely checking them. Otherwise I would not be able to post until tomorrow.
I think the problem is the DATA step in which you are extracting _TYPE_+"CORR". You do NOT want to modify the matrix that is produced by PROC CORR. PROC FACTOR expects to receive a TYPE=CORR data set, and it uses the _TYPE_ variable to reconstruct the important statistics that it needs. I think that what is happening is that PROC FACTOR is interpreting the data as being raw data, rather than a pre-computed correlation matrix.
Thank you for the prompt response.
I did run the PROC Factor with the matrix generated by Proc Corr and get the followign error:
ERROR: Correlation matrix is singular.
NOTE: Prior communality estimates will be 1.0.
NOTE: 93 factors will be retained by the PROPORTION criterion.
WARNING: Too many factors for a unique solution.
ERROR: Maximum iterations exceeded
Is the correlation matrix is singular due to too many variables ans too few cases of data (I had 53000+ observations)? Or is it because I have too many highly correlated items in my matrix?
Could you please help (further)? If I cannt generate sensible results today then it makes sense to ask for a days time more.
Regards
Mari
With 53K observations, I wouldn't expect collinearities in 103 ordinal variables. Are you using dummy variables instead of ordinal variable for the data? For example, if a variable X has values 1, 2 and 3, you will get a singular matrix if you replace X with three dummy variables X1=(X=1), X2=(X=2), and X3=(X=3).
Make sure in the PROC CORR that you are using the original ordinal variable, where each variable corresponds to one question.
My guess is that you have a polychoric matrix that is not positive definite. This can happen for various reasons, including the presence of missing values. If you have missing values, you could try adding the NOMISS option to the PROC CORR statement (as discussed in the article) to perform listwise deletion of missing values.
It could also indicate collinearity. The REG procedure can check for collinearity among the Q variables. You need to "invent" response variable and then use the COLLIN option on the MODEL statement, as follows:
data Check;
set Have;
Y = rand("Normal");
run;
proc reg data=Check plots=none;
model Y = Q1-Q3 / COLLIN; /* <= put all "Q" variables here */
run;
Any variable that gets 0 for a parameter estimate is collinear with others. You will also get a NOTE such as
Good Morning, Rick
I have no missing observations in my data. I did use PROC REG with the option COLLIN and of the 103 Variables, only 8 Variables did not have 0 as their parameter estimates.
Is it possible that 95 Variables are a linear combination of these (mere) 8 variables? Would you recommend continuing only with the 8 Variables? Would I not be loosing information pertaining to the other (maybe not all the 95) variables?
Thanks & Regards
Mari
mszommer wrote:I have no missing observations in my data. I did use PROC REG with the option COLLIN and of the 103 Variables, only 8 Variables did not have 0 as their parameter estimates.
Is it possible that 95 Variables are a linear combination of these (mere) 8 variables? Would you recommend continuing only with the 8 Variables? Would I not be loosing information pertaining to the other (maybe not all the 95) variables?
You ask whether it is possible that Is 95 variables are linear combinations of 8. Assuming that you ran PROC REG correctly, that is indeed what your results are saying. At this point, you need to look at the data to find out why there is not much valid information. Are most columns entirely zero? Entirely 1? Use PROC FREQ to run an analysis to determine the distribution of the variables.
There's not much else that I can suggest if your data are degenerate. Review the way the data were gathered and onsult with a knowledgeable colleague/statisticians who can help you discover the problem.
Good luck!
1) Try add one option to make it as corr dataset. and no need VAR statement. data survey.corrMatrix(type=corr); set survey.pchor_dm; where _type_='CORR'; drop _type_ _name_; run; proc factor data=survey.corrMatrix; run; 2)you can get the nearest correlation matrix of it and feed to proc factor. http://blogs.sas.com/content/iml/2012/11/28/computing-the-nearest-correlation-matrix.html 3) They are ordinal variable,so I guess you can't use it in proc factor. Maybe you could do it in PCA for qualtative data and get CORR matrix . PROC PRINQUAL COR ;
Thank you, KSharp.
Could you help me with the syntax to generate the nearest correlation matrix?
/* symmetric matrix, but not positive definite */ A = {1.0 0.99 0.35, 0.99 1.0 0.80, 0.35 0.80 1.0} ; B = NearestCorr(A); print B;
As per the link that you quoted, my A would be survey.corrMatrix and B the NearestCorr(survey.corrMatrix), is it? Sorry, that I do not get this part.
Regards
Mari
Rick's blog has already offer the IML code to get nearest correlation matrix.Just follow it step by step. As I said before, your data is not continuous ,while is ordinal or nominal value, therefore you can not directly use PROC FACTOR. Your best choice is PROC PRINQUAL. Especial the second example in its documentation. proc prinqual data=bball out=tbball scores n=1 tstandard=z plots=transformations; title2 'Optimal Monotonic Transformation of Ranked Teams'; title3 'with Constrained Estimation of Unranked Teams'; transform untie(CSN -- SportsIllustrated); id School; run; plots=transformations; title2 'Optimal Monotonic Transformation of Ranked Teams'; title3 'with Constrained Estimation of Unranked Teams'; transform untie(CSN -- SportsIllustrated); id School; run; * Perform the Final Principal Component Analysis; proc factor nfactors=1 plots=scree; title4 'Principal Component Analysis'; ods select factorpattern screeplot; var TCSN -- TSportsIllustrated; run;
Here I quoted from the Example 2 of PROC PRINQUAL. An alternative approach is to use the pairwise deletion option of the CORR procedure to compute the correlation matrix and then use PROC PRINCOMP or PROC FACTOR to perform the principal component analysis. This approach has several disadvantages. The correlation matrix might not be positive semidefinite (PSD), an assumption required for principal component analysis. PROC PRINQUAL always produces a PSD correlation matrix. Even with pairwise deletion, PROC CORR removes the six observations that have only a single nonmissing value from this data set. Finally, it is still not possible to calculate scores on the principal components for those teams that have missing values.
Hello KSharp,
Thank you for your replies.
I was attempting to run the PROC PRINQUAL and it runs forever.
I then mentioned the nominal and ordinal variables as opscore and monotone (without the untie option, as I do not have any missing values.):
proc prinqual data=survey.mydata out=survey.prinq scores n=1 tstandard=z
plots=transformations;
transform opscore(Q1 Q3 Q4A Q4B Q4C Q4D Q4E Q4F Q4G
Q4H Q4I Q4J Q5A Q5B Q5C Q5D Q5E Q5F Q5G Q5H Q7 Q13
Q22 Q23 Q30 Q31 Q32A
Q32B Q32C Q32D Q32E Q32F)
monotone(Q6A Q6B Q6C Q6D Q6E Q6F Q10A
Q10B Q10C Q10D Q10E Q10F Q10G Q11A Q11B Q11C Q11D Q11E Q11F Q11G Q15A
Q15B Q15C Q15D Q16A Q16B Q16C Q17 Q18A Q18B Q18C Q18D Q19A Q19B
Q19C Q20A Q20B Q20C Q21A Q21B Q21C Q21D Q21E Q21F Q21G Q21H Q21I Q21J
Q24A Q24B Q24C Q24D Q24E Q24F Q24G Q24H Q24I Q25 Q26A Q26B
Q26C Q26D Q26E Q26F Q26G Q26H Q26I Q26J Q26K Q27 Q28);
run;
and got a note 'Algorithm converged'. Could you tell me what it means?
Regards
Mari
It means the result looks real good. Could you try a small data and variables to see if you could get the result.
And try the code as simple as ods select none; proc prinqual data=survey.mydata out=survey.prinq; transform ........... run; ods select all; proc factor.........
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.