Solved: Factor analysis using maximum likelihood estimation - Page 2

stuti · Posted 12-25-2016 01:38 AM

I have already used close statement but in log it gives the message " Cannot close WORK.SMAP_ML; it is not open."

Rick_SAS · Posted 12-25-2016 06:46 AM

If the error is coming from the SUBMIT statement in which you call PROC FACTOR, then you need to close the data set before the SUBMIT statement. PROC FACTOR can't write to a data set that is already in use.

stuti · Posted 12-25-2016 08:00 AM

yes sir may be error was because of submit statement. I have used close statement earlier to submit but for repetition still without any error message in log, I am not getting answers of all calculations.

For your reference I am coping the entire program here, may be little lengthy but if in first line I use rep=1 then I am getting all results but if I change it even to 2 then I don't get the results of Accuracy, bias and all other parameters. So how to do this repetitions?

Thank you so much for your help. I have learned sas programming just because of this and still trying to learn because of the help provided by you.

SAS program:

%let rep=1; * N of samples from each population;

proc iml;

**********************************************************************;

*************** Begin Specification of Data Conditions ***************;

**********************************************************************;

p = 10; *20; *40; *60;

k = 2; *2; *4; *8;

d_frac = 1; *.05; *.25; *.50; *.75; *.95;

Commun_type = 1;

N_pops = 1; * N of populations to generate;

nn1 = 100000;

means=j(1,p,0);

variance = j(1,p,1);

********************************************************************;

start Make_PopR(nvars,nfactors,commun_type,A1,R);

* program to generate the population covariance matrices

* Inputs are nvars = number of variables in the matrix

nfactors = number of factors

commun = Type of communality

Output is R = population correlation matrix;

bp1=j(nvars,nvars,0);

bp2=uniform(bp1);

if commun_type=3 then do;

bp3=(bp2*.2999999)+.55;

b1square=round(diag(bp3),.1);

end;

if commun_type=1 then do;

bp3=(bp2*.2999999)+.15;

b1square=round(diag(bp3),.1);

end;

if commun_type=2 then do;

bp3=(bp2*.6999999)+.15;

b1square=round(diag(bp3),.1);

end;

B1=b1square##.5;

b3square=I(nvars)-b1square;

B3=b3square##.5;

A1tilde1=j(nvars,nfactors,0);

A1tilde2=round((uniform(A1tilde1)*(nfactors-.00000001))-.5);

A1tilde=A1tilde2;

do j=2 to nfactors;

do i=1 to nvars;

if j<nfactors then do;

A1tilde[i,j]=round(((nfactors-.00000001-sum(A1tilde[i,1:j-1]))

*uniform(0))-.5);

end;

if j=nfactors then do;

A1tilde[i,nfactors]=nfactors-sum(A1tilde[i,1:nfactors-1])-1;

end;

x=normal(A1tilde);

x2=x##2;

d=j(nvars,nfactors,0);

do j=1 to nfactors;

do i=1 to nvars;

d[i,j]=(sum(x2[i,1:nfactors]))##-.5;

end;

cvec=j(1,nfactors,0);

do j=1 to nfactors;

cvec[1,j]=round((uniform(0)*.2999999)+.65,.1);

end;

c=j(nvars,1,1)*cvec;

c2=c##2;

ones=j(nvars,nfactors,1);

y=A1tilde#c + d#x#((ones-c2)##.5);

k=.2;

z=j(nvars,nfactors,0);

do j=1 to nfactors;

do i=1 to nvars;

z[i,j]=((1+k)*y[i,j]*(y[i,j]+abs(y[i,j])+k))/((2+k)*(abs(y[i,j])+k));

end;

z2=z##2;

g=j(nvars,nfactors,0);

do j=1 to nfactors;

do i=1 to nvars;

g[i,j]=(sum(z2[i,1:nfactors]))##-.5;

end;

A1star=g#z;

A1=B1*A1star;

A3star=I(nvars);

A3=B3*A3star;

R=A1*A1`+A3*A3`;

Finish;

start gendata2a(NN1,seed1,variance,bb,cc,dd,mu,r_matrix,YY,p,d_frac);

L = eigval(r_matrix);

neg_eigval = 0;

do r = 1 to nrow(L);

if L[r,1] < 0 then neg_eigval = 1;

end;

if neg_eigval = 0 then do; * matrix is positive definite, so use the Cholesky root approach;

COLS = NCOL(r_matrix);

G = ROOT(r_matrix);

YY=rannor(repeat(seed1,nn1,COLS));

YY = YY*G;

do r = 1 to NN1;

do c = 1 to COLS;

YY[r,c] = (-1*cc) + (bb*YY[r,c]) + (cc*YY[r,c]##2) + (dd*YY[r,c]##3);

YY[r,c] = (YY[r,c] * SQRT(variance[1,c])) + mu[1,c];

end;

if neg_eigval = 1 then do; * matrix is not positive definite, so use the PCA approach;

COLS = NCOL(r_matrix);

V = eigvec(r_matrix);

do i = 1 to nrow(L);

do j = 1 to ncol(V);

if L[i,1] > 0 then V[j,i] = V[j,i] # sqrt(L[i,1]);

if L[i,1] <= 0 then V[j,i] = V[j,i] # sqrt(.000000001);

end;

YY=rannor(repeat(seed1,nn1,COLS));

YY = V*YY`;

YY = YY`;

do r = 1 to NN1;

do c = 1 to COLS;

YY[r,c] = (-1*cc) + (bb*YY[r,c]) + (cc*YY[r,c]##2) + (dd*YY[r,c]##3);

YY[r,c] = (YY[r,c] * SQRT(variance[1,c])) + mu[1,c];

end;

do r = 1 to nn1;

do c = 1 to (p*d_frac);

if yy[r,c] < 0 then yy[r,c] = 0;

else if yy[r,c] = 0 then yy[r,c] = 1;

else if yy[r,c] > 0 then yy[r,c] = 1;

end;

finish;

start gendata2b(NN2,seed1,variance,bb,cc,dd,mu,r_matrix,YY,p,d_frac);

L = eigval(r_matrix);

neg_eigval = 0;

do r = 1 to nrow(L);

if L[r,1] < 0 then neg_eigval = 1;

end;

if neg_eigval = 0 then do; * matrix is positive definite, so use the Cholesky root approach;

COLS = NCOL(r_matrix);

G = ROOT(r_matrix);

YY=rannor(repeat(seed1,nn2,COLS));

YY = YY*G;

do r = 1 to NN2;

do c = 1 to COLS;

YY[r,c] = (-1*cc) + (bb*YY[r,c]) + (cc*YY[r,c]##2) + (dd*YY[r,c]##3);

YY[r,c] = (YY[r,c] * SQRT(variance[1,c])) + mu[1,c];

end;

if neg_eigval = 1 then do;

COLS = NCOL(r_matrix);

V = eigvec(r_matrix);

do i = 1 to nrow(L);

do j = 1 to ncol(V);

if L[i,1] > 0 then V[j,i] = V[j,i] # sqrt(L[i,1]);

if L[i,1] <= 0 then V[j,i] = V[j,i] # sqrt(.000000001);

end;

YY=rannor(repeat(seed1,nn2,COLS));

YY = V*YY`;

YY = YY`;

do r = 1 to NN2;

do c = 1 to COLS;

YY[r,c] = (-1*cc) + (bb*YY[r,c]) + (cc*YY[r,c]##2) + (dd*YY[r,c]##3);

YY[r,c] = (YY[r,c] * SQRT(variance[1,c])) + mu[1,c];

end;

do r = 1 to nn2;

do c = 1 to (p*d_frac);

if yy[r,c] < 0 then yy[r,c] = 0;

else if yy[r,c] = 0 then yy[r,c] = 1;

else if yy[r,c] > 0 then yy[r,c] = 1;

end;

finish;

* +----------------------------------------------------------------+

Subroutine computes bias, MSE and accuracy of factor loadings

Inputs: L = matrix of sample pattern coefficients

Lambda = matrix of population pattern coefficients

* +-----------------------------------------------------------------+;

START ACCURACY(L,Lambda,OK_Load,OK_NoLoad,bias_loadings,MSE_loadings,

pattern_accuracy,perfect_accuracy,

pattern_accuracy30,perfect_accuracy30);

p = nrow(L);

k = ncol(L);

pop_loaded = J(p,1,0);

do variable = 1 to p;

do factor = 1 to k;

if abs(Lambda[variable,factor]) >= .30 then pop_loaded[variable,1] = 1;

end;

sample_loaded = J(p,1,0);

do variable = 1 to p;

do factor = 1 to k;

if abs(L[variable,factor]) >= .30 then sample_loaded[variable,1] = 1;

end;

OK_Load = 0;

OK_NoLoad = 0;

do variable = 1 to p;

if sample_loaded[variable,1] = 1 & pop_loaded[variable,1] = 1 then

OK_Load = OK_Load + 1;

if sample_loaded[variable,1] = 0 & pop_loaded[variable,1] = 0 then

OK_NoLoad = OK_NoLoad + 1;

end;

OK_Load = OK_Load / sum(pop_loaded); * proportion of variables with loading agreement;

if sum(pop_loaded) = p then OK_NoLoad = .;

if sum(pop_loaded) < p then OK_NoLoad = OK_NoLoad/(p-sum(pop_loaded));

* +------------------------------------------------+

Compute bias and MS Error in the factor loadings

* +------------------------------------------------+;

bias_loadings=j(p,k,0);

MSE_loadings=j(p,k,0);

do row =1 to p;

bias_loadings[row,]=L[row,]-lambda[row,];

MSE_loadings[row,]=(L[row,]-lambda[row,])##2;

end;

* +--------------------------------------------+

Compute pattern accuracy using the .30 thumb

* +--------------------------------------------+;

pattern_accuracy = j(p,k,0);

perfect_accuracy = 0;

pattern_accuracy30 = j(p,1,0);

perfect_accuracy30 = 0;

do variable = 1 to p;

NoLoad = 0;

do factor = 1 to k;

if (abs(Lambda[variable,factor]) >= .30 &

abs(L[variable,factor]) >= .30)

then do;

pattern_accuracy[variable,factor] = 1;

pattern_accuracy30[variable,1] = 1;

end;

if (abs(Lambda[variable,factor]) < .30 &

abs(L[variable,factor]) < .30) then do;

pattern_accuracy[variable,factor] = 1;

NoLoad = NoLoad + 1;

end;

if NoLoad = k then pattern_accuracy30[variable,1] = 1;

end;

if mean(pattern_accuracy) = 1 then perfect_accuracy = 1;

if mean(pattern_accuracy30) = 1 then perfect_accuracy30 = 1;

finish;

*+_________________________________________________________________+

Subroutine to calculate coefficient of congruence - mean phi

+_________________________________________________________________+;

start get_phi(lambda,L,meanphi);

k = ncol(lambda);

num = vecdiag(lambda`*L);

den1 = vecdiag(lambda`*lambda);

den2 = vecdiag(L`*L);

phi_k = J(k,1,0);

meanphi = 0;

n_terms = 0;

do m = 1 to k;

if (den1[m,1]>0 & den2[m,1]>0) then do;

phi_k[m,1] = num[m,1] / sqrt(den1[m,1]#den2[m,1]);

meanphi = meanphi + phi_k[m,1];

n_terms = n_terms + 1;

end;

if (den1[m,1] = 0 | den2[m,1] = 0) then do;

phi_k[m,1] = 999;

end;

meanphi = meanphi / n_terms;

finish;

start factor_scores(Lambda,L,sampdat,R_fscores,Non_zero);

*+________________________________________________________________________+

Compute Factor Score Estimates

+_______________________________________________________________________+;

p = nrow(L);

k = ncol(L);

SC =J(p,k,0);

SCP =J(p,k,0);

Non_zero = J(k,1,0);

do j = 1 to p;

do kk = 1 to k;

if L [j,kk] >= .3 then SC[j,kk]=1;

if L [j,kk] <= -.3 then SC[j,kk]=-1;

if Lambda [j,kk] >= .3 then do;

SCP[j,kk]=1;

Non_zero[kk,1] = 1;

end;

if Lambda [j,kk] <= -.3 then do;

SCP[j,kk]=-1;

Non_zero[kk,1] = 1;

end;

FSE = sampdat*SC;

FS = sampdat*SCP;

FS_FSE = FSE||FS;

R_matrix = corr(FS_FSE);

R_matrix2 = R_matrix[k+1:2#k,1:k];

R_fscores = vecdiag(R_matrix2);

finish;

Do pop_num = 1 to N_pops; * Loop for 10 populations;

run Make_PopR(p,k,commun_type,A1,R_pop);

Lambda = A1;

numr = r_pop[+,+] - p;

deno = r_pop[+,+];

ratio = numr/deno;

f2_pop = (p/(p-1))*ratio;

r2_pop = f2_pop/(1+f2_pop);

corr = r_pop;

seed1=round(1000000*rannor(0));

*nn = 100000;

chg = 1;

cycle = 0;

corr_tmp = corr;

do until (chg = 0);

run gendata2a(NN1,seed1,variance,1,0,0,means,corr_tmp,sim_data,p,d_frac);

sim_corr = corr(sim_data);

resid_m = sim_corr - corr;

tot_res = sum(abs(resid_m));

if cycle = 0 then do;

best_corr = corr_tmp;

best_res = tot_res;

end;

if cycle > 0 then do;

if tot_res < best_res then do;

best_corr = corr_tmp;

best_res = tot_res;

end;

if tot_res < (.005#(((p-1)#p)/2)) then CHG = 0; * Convergence!;

if cycle > 30 then do;

if tot_res < (.01#(((p-1)#p)/2)) then CHG = 0; * Convergence!;

end;

if cycle > 200 then CHG = 0;

if CHG = 1 then corr_tmp = corr_tmp - resid_m; * adjust template and simulate another

large sample;

cycle = cycle + 1;

if CHG = 0 then do;

end;

do S_Size =1 to 1; * Loop for sample sizes;

if S_Size = 1 then Sampsize2=3*P;

if S_Size = 2 then Sampsize2=2*p;

if S_Size = 3 then Sampsize2=p;

if S_Size = 4 then Sampsize2=4*p;

if S_Size=5 then Sampsize2=5*p;

* Loop for 1000 Samples;

**********************************************************:

* Begin data generation *;

**********************************************************;

seed1=round(1000000*ranuni(0));

nn2 = sampsize2;

corr_tmp = best_corr;

run gendata2b(NN2,seed1,variance,1,0,0,means,corr_tmp,sim_data,p,d_frac);

sampdat = sim_data;

r_samp=corr(sampdat);

print sampdat;

print A1;

CREATE SAMP FROM SAMPDAT ;APPEND FROM SAMPDAT; CLOSE SAMP;

close work.smap_ml;

SUBMIT;

ods output OrthRotFactPat=samp_ml;

proc factor data=WORK.samp method=ml priors=smc rotate=varimax noint nfactors=2

norm=kaiser heywood ;

var col1 col2 col3 col4 col5 col6 col7 col8 col9 col10;

run;

proc print data=samp_ml;

run;

ENDSUBMIT;

use work.samp_ml; /* open the data set */

read all var {"Factor1" "Factor2" }into L;

close work.samp_ml;

run ACCURACY(L,Lambda,OK_Load,OK_NoLoad,bias_loadings,MSE_loadings,

pattern_accuracy,perfect_accuracy,

pattern_accuracy30,perfect_accuracy30);

run get_phi(lambda,L,meanphi);

run factor_scores(Lambda,L,sampdat,R_fscores,Non_zero);

if &rep = 1 then do;

_ok_load_m = ok_load;

_ok_Noload_m = ok_Noload;

_Bias_Loadings_m = Bias_Loadings;

_MSE_Loadings_m = MSE_Loadings;

_Pattern_Accuracy_m = Pattern_Accuracy;

_Perfect_Accuracy_m = Perfect_Accuracy;

_Pattern_Accuracy30_m = Pattern_Accuracy30;

_Perfect_Accuracy30_m = Perfect_Accuracy30;

_meanphi_m = meanphi;

_R_Fscores_m = R_Fscores;

_Non_zero_m = Non_zero;

if &rep > 1 then do;

_ok_load_m = _ok_load_m + ok_load;

_ok_Noload_m = _ok_Noload_m + ok_Noload;

_Bias_Loadings_m = _Bias_Loadings_m + Bias_Loadings;

_MSE_Loadings_m = _MSE_Loadings_m + MSE_Loadings;

_Pattern_Accuracy_m = _Pattern_Accuracy_m + Pattern_Accuracy;

_Perfect_Accuracy_m = _Perfect_Accuracy_m + Perfect_Accuracy;

_Pattern_Accuracy30_m = _Pattern_Accuracy30_m + Pattern_Accuracy30;

_Perfect_Accuracy30_m = _Perfect_Accuracy30_m + Perfect_Accuracy30;

_meanphi_m = _meanphi_m + meanphi;

_R_Fscores_m = _R_Fscores_m + R_Fscores;

_Non_zero_m = _Non_zero_m + Non_zero;

end;

if &rep = 1 then nsamples = 1;

if &rep > 1 then nsamples = nsamples +1;

*End Replications Loop*;

_ok_load_m = _ok_load_m/NSamples;

_ok_Noload_m = _ok_Noload_m/NSamples;

_Bias_Loadings_m = _BIas_Loadings_m/NSamples;

_MSE_Loadings_m = _MSE_Loadings_m/NSamples;

_Pattern_Accuracy_m = _Pattern_Accuracy_m/NSamples;

_Perfect_Accuracy_m = _Perfect_Accuracy_m/NSamples;

_Pattern_Accuracy30_m = _Pattern_Accuracy30_m/NSamples;

_Perfect_Accuracy30_m = _Perfect_Accuracy30_m/NSamples;

_meanphi_m = _meanphi_m/NSamples;

_R_Fscores_m = _R_Fscores_m/NSamples;

_Non_zero_m = _Non_zero_m/NSamples;

print Sampsize2 commun_type d_frac k p;

print _OK_Load_m _OK_NoLoad_m

_Perfect_Accuracy30_m _Perfect_Accuracy_m _Meanphi_m;

print _Pattern_Accuracy30_m _R_Fscores_m _Non_zero_m;

print _Bias_Loadings_m;

print _MSE_Loadings_m;

print _Pattern_Accuracy_m;

end; end; end;

*End Sample Size Loop*;

*End loop for 10 populations*;

quit;

stuti · Posted 12-25-2016 09:58 AM

Able to modify my program.. now getting results for even repetation...

Thank you so much sir for all the guidence given to me...

still I have to do some modifications in program but may be now I am able to do it.

May be repetation is time consuming but still will be trying to improve it.

If in future would have any problem will come back once again

thank you sir.

Rick_SAS · Posted 12-25-2016 07:08 PM

I am glad that you fixed the problem.

I notice that your program has many double loops in which you are iterating over rows and columns of a matrix. When you have time, you might want to learn how to vectorize those loops. Some articles to read are:

- Vectorizing the construction of a structured matrix

- How to vectorize computations in a matrix language

As shown in the article "Timing performance improvements due to vectorization," vectorized code is much faster than looping over rows and columns.

stuti · Posted 12-28-2016 05:35 AM

Hello sir

will definitely see try to vectorized the loops .

One more thing that I want to ask is that Suppose I am generating 1000 samples and on that I am applying PROC factor using following codes.

CREATE SAMP FROM SAMPDAT ;/* generated data*/;

APPEND FROM SAMPDAT; CLOSE SAMP;

close work.samp_ml;

SUBMIT;

ods output OrthRotFactPat=samp_ml;

proc factor data=WORK.samp method=ml priors=smc rotate=varimax noint nfactors=2 norm=kaiser heywood ;

var col1 col2 col3 col4 col5 col6 col7 col8 col9 col10;

run;

proc print data=samp_ml;

run;

ENDSUBMIT;

use work.samp_ml;

read all var {"Factor1" "Factor2" }into L;

close work.samp_ml;

run

Here in results I am getting all the outputs produced by PROC factor. I just need RotFactPattern in further calculations which I am using as L. So is there any way out that these results are suppressed?

Rick_SAS · Posted 12-28-2016 08:26 AM

If you are generating 100 samples, you should write all 100 into a single data set, call PROC FACTOR one time and use the BY statement to conduct the 100 analyses. See the article "Simulation in SAS: The Slow way or the BY way"

For a working example, see "Simulate many samples from a logistic regression model"

To suppress ODS output, see "Turn off ODS when running simulations in SAS"

If you are going to be doing simulations in SAS, it sounds like you might profit by obtaining a copy of Simulating Data with SAS. The book has gotten excellent reviews and is a best seller. It contains all these tips and more.

stuti · Posted 12-29-2016 05:05 AM

sir,

I am generating 1000 different samples and on all 1000 different data I am applying proc factor.

the ouput says" The results required more than 4 MB memory and so it is not printed.

because in results it tries to prints for all 1000 sample say initial, roteted and all other results. I dont want any of such results. Just I have created ods output which is OrthFactPattern which is I am using to compare with population data. and after that I am taking average for all 1000 samples. In result, I just want to print that averages without any proc factor result. Is there any way out to do so?

Rick_SAS · Posted 12-29-2016 06:55 AM

The articles that I linked to show you how to suppress the ODS output. I strongly encourage you to get the program working perfectly for 5 samples before you attempt 1000.

stuti · Posted 01-11-2017 05:36 AM

Hello sir

First I am just trying to repeat PROC FACTOR on 2 generated samples only. The issue is that in RESULT Viewer I am getting all the factor outputs eg., initial data, prior communality estimates, inital communality estimates, all iterations of communality etc.

I have tried ODS EXCLUDE ALL but still the results were there.I have even tried %ODSOff but still no effect. So how to supress all that result?

Rick_SAS · Posted 01-11-2017 11:00 AM

The %ODSOff macro in the blog post (and book) includes the statement:

ods noresults;

which causes the ODS results viewer to not add entries. If you are running an old version of SAS/IML, put the ODS statements INSIDE the SUBMIT/ENDSUBMIT block and before the PROC FACTOR call.

stuti · Posted 01-11-2017 11:45 PM

Thank you so much sir,

%ODSOff inside the SUBMIT really worked properly and even it reduced the calculation time greatly.

even now I am able to run my program for any replications.

Thank you so much.

Because of all the help provided by you I am able to develop my own program.

still if i have any problem will come back again.

Rick_SAS · Posted 01-12-2017 05:59 AM

This might be a good time to mark this problem "Solved". If you have another problem, open a new thread.

stuti · Posted 12-12-2016 01:35 AM

I am also attaching excel file as the sample data.

On that i ran my program and sas factor analysis using mle.

but the answers are different.

Re: Factor analysis using maximum likelihood estimation

Re: Factor analysis using maximum likelihood estimation

Re: Factor analysis using maximum likelihood estimation

Re: Factor analysis using maximum likelihood estimation

Re: Factor analysis using maximum likelihood estimation

Re: Factor analysis using maximum likelihood estimation

Re: Factor analysis using maximum likelihood estimation

Re: Factor analysis using maximum likelihood estimation

Re: Factor analysis using maximum likelihood estimation

Re: Factor analysis using maximum likelihood estimation

Re: Factor analysis using maximum likelihood estimation

Re: Factor analysis using maximum likelihood estimation

Re: Factor analysis using maximum likelihood estimation

Re: Factor analysis using maximum likelihood estimation

Ready to join fellow brilliant minds for the SAS Hackathon?