BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
stuti
Obsidian | Level 7

I have already used close statement but in log it gives the message " Cannot close WORK.SMAP_ML; it is not open.

 

Rick_SAS
SAS Super FREQ

If the error is coming from the SUBMIT statement in which you call PROC FACTOR, then you need to close the data set before the SUBMIT statement.  PROC FACTOR can't write to a data set that is already in use.

stuti
Obsidian | Level 7

yes sir may be error was because of submit statement. I have used close statement earlier to submit but for repetition still without any error message in log, I am not getting answers of all calculations.

 

For your reference I am coping the entire program here, may be little lengthy but if in first line I use rep=1 then I am getting all results but if I change it even to 2 then I don't get the results of Accuracy, bias and all other parameters. So how to do this repetitions?

Thank you so much for your help. I have learned sas programming just because of this and still trying to learn because of the help provided by you.

 

 

SAS program:

 

%let rep=1; * N of samples from each population;

proc iml;

**********************************************************************;

*************** Begin Specification of Data Conditions ***************;

**********************************************************************;

p = 10; *20; *40; *60;

k = 2; *2; *4; *8;

d_frac = 1; *.05; *.25; *.50; *.75; *.95;

Commun_type = 1;

N_pops = 1; * N of populations to generate;

 

nn1 = 100000;

means=j(1,p,0);

variance = j(1,p,1);

********************************************************************;

start Make_PopR(nvars,nfactors,commun_type,A1,R);

* program to generate the population covariance matrices

* Inputs are nvars = number of variables in the matrix

nfactors = number of factors

commun = Type of communality

Output is R = population correlation matrix;

bp1=j(nvars,nvars,0);

bp2=uniform(bp1);

if commun_type=3 then do;

bp3=(bp2*.2999999)+.55;

b1square=round(diag(bp3),.1);

end;

if commun_type=1 then do;

bp3=(bp2*.2999999)+.15;

b1square=round(diag(bp3),.1);

end;

if commun_type=2 then do;

bp3=(bp2*.6999999)+.15;

b1square=round(diag(bp3),.1);

end;

B1=b1square##.5;

b3square=I(nvars)-b1square;

B3=b3square##.5;

A1tilde1=j(nvars,nfactors,0);

A1tilde2=round((uniform(A1tilde1)*(nfactors-.00000001))-.5);

A1tilde=A1tilde2;

do j=2 to nfactors;

do i=1 to nvars;

if j<nfactors then do;

A1tilde[i,j]=round(((nfactors-.00000001-sum(A1tilde[i,1:j-1]))

*uniform(0))-.5);

end;

if j=nfactors then do;

A1tilde[i,nfactors]=nfactors-sum(A1tilde[i,1:nfactors-1])-1;

end;

end;

end;

x=normal(A1tilde);

x2=x##2;

d=j(nvars,nfactors,0);

do j=1 to nfactors;

do i=1 to nvars;

d[i,j]=(sum(x2[i,1:nfactors]))##-.5;

end;

end;

cvec=j(1,nfactors,0);

do j=1 to nfactors;

cvec[1,j]=round((uniform(0)*.2999999)+.65,.1);

end;

c=j(nvars,1,1)*cvec;

c2=c##2;

ones=j(nvars,nfactors,1);

y=A1tilde#c + d#x#((ones-c2)##.5);

k=.2;

z=j(nvars,nfactors,0);

do j=1 to nfactors;

do i=1 to nvars;

z[i,j]=((1+k)*y[i,j]*(y[i,j]+abs(y[i,j])+k))/((2+k)*(abs(y[i,j])+k));

end;

end;

z2=z##2;

g=j(nvars,nfactors,0);

do j=1 to nfactors;

do i=1 to nvars;

g[i,j]=(sum(z2[i,1:nfactors]))##-.5;

end;

end;

A1star=g#z;

A1=B1*A1star;

A3star=I(nvars);

A3=B3*A3star;

R=A1*A1`+A3*A3`;

Finish;

start gendata2a(NN1,seed1,variance,bb,cc,dd,mu,r_matrix,YY,p,d_frac);

L = eigval(r_matrix);

neg_eigval = 0;

do r = 1 to nrow(L);

if L[r,1] < 0 then neg_eigval = 1;

end;

if neg_eigval = 0 then do; * matrix is positive definite, so use the Cholesky root approach;

COLS = NCOL(r_matrix);

G = ROOT(r_matrix);

YY=rannor(repeat(seed1,nn1,COLS));

YY = YY*G;

do r = 1 to NN1;

do c = 1 to COLS;

YY[r,c] = (-1*cc) + (bb*YY[r,c]) + (cc*YY[r,c]##2) + (dd*YY[r,c]##3);

YY[r,c] = (YY[r,c] * SQRT(variance[1,c])) + mu[1,c];

end;

end;

end;

if neg_eigval = 1 then do; * matrix is not positive definite, so use the PCA approach;

COLS = NCOL(r_matrix);

V = eigvec(r_matrix);

do i = 1 to nrow(L);

do j = 1 to ncol(V);

if L[i,1] > 0 then V[j,i] = V[j,i] # sqrt(L[i,1]);

if L[i,1] <= 0 then V[j,i] = V[j,i] # sqrt(.000000001);

end;

end;

YY=rannor(repeat(seed1,nn1,COLS));

YY = V*YY`;

YY = YY`;

do r = 1 to NN1;

do c = 1 to COLS;

YY[r,c] = (-1*cc) + (bb*YY[r,c]) + (cc*YY[r,c]##2) + (dd*YY[r,c]##3);

YY[r,c] = (YY[r,c] * SQRT(variance[1,c])) + mu[1,c];

end;

end;

end;

do r = 1 to nn1;

do c = 1 to (p*d_frac);

if yy[r,c] < 0 then yy[r,c] = 0;

else if yy[r,c] = 0 then yy[r,c] = 1;

else if yy[r,c] > 0 then yy[r,c] = 1;

end;

end;

finish;

start gendata2b(NN2,seed1,variance,bb,cc,dd,mu,r_matrix,YY,p,d_frac);

L = eigval(r_matrix);

neg_eigval = 0;

do r = 1 to nrow(L);

if L[r,1] < 0 then neg_eigval = 1;

end;

if neg_eigval = 0 then do; * matrix is positive definite, so use the Cholesky root approach;

COLS = NCOL(r_matrix);

G = ROOT(r_matrix);

YY=rannor(repeat(seed1,nn2,COLS));

YY = YY*G;

do r = 1 to NN2;

do c = 1 to COLS;

YY[r,c] = (-1*cc) + (bb*YY[r,c]) + (cc*YY[r,c]##2) + (dd*YY[r,c]##3);

YY[r,c] = (YY[r,c] * SQRT(variance[1,c])) + mu[1,c];

end;

end;

end;

if neg_eigval = 1 then do;

COLS = NCOL(r_matrix);

V = eigvec(r_matrix);

do i = 1 to nrow(L);

do j = 1 to ncol(V);

if L[i,1] > 0 then V[j,i] = V[j,i] # sqrt(L[i,1]);

if L[i,1] <= 0 then V[j,i] = V[j,i] # sqrt(.000000001);

end;

end;

YY=rannor(repeat(seed1,nn2,COLS));

YY = V*YY`;

YY = YY`;

do r = 1 to NN2;

do c = 1 to COLS;

YY[r,c] = (-1*cc) + (bb*YY[r,c]) + (cc*YY[r,c]##2) + (dd*YY[r,c]##3);

YY[r,c] = (YY[r,c] * SQRT(variance[1,c])) + mu[1,c];

end;

end;

end;

do r = 1 to nn2;

do c = 1 to (p*d_frac);

if yy[r,c] < 0 then yy[r,c] = 0;

else if yy[r,c] = 0 then yy[r,c] = 1;

else if yy[r,c] > 0 then yy[r,c] = 1;

end;

end;

finish;

* +----------------------------------------------------------------+

Subroutine computes bias, MSE and accuracy of factor loadings

Inputs: L = matrix of sample pattern coefficients

Lambda = matrix of population pattern coefficients

* +-----------------------------------------------------------------+;

START ACCURACY(L,Lambda,OK_Load,OK_NoLoad,bias_loadings,MSE_loadings,

pattern_accuracy,perfect_accuracy,

pattern_accuracy30,perfect_accuracy30);

p = nrow(L);

k = ncol(L);

pop_loaded = J(p,1,0);

do variable = 1 to p;

do factor = 1 to k;

if abs(Lambda[variable,factor]) >= .30 then pop_loaded[variable,1] = 1;

end;

end;

sample_loaded = J(p,1,0);

do variable = 1 to p;

do factor = 1 to k;

if abs(L[variable,factor]) >= .30 then sample_loaded[variable,1] = 1;

end;

end;

OK_Load = 0;

OK_NoLoad = 0;

do variable = 1 to p;

if sample_loaded[variable,1] = 1 & pop_loaded[variable,1] = 1 then

OK_Load = OK_Load + 1;

if sample_loaded[variable,1] = 0 & pop_loaded[variable,1] = 0 then

OK_NoLoad = OK_NoLoad + 1;

end;

OK_Load = OK_Load / sum(pop_loaded); * proportion of variables with loading agreement;

if sum(pop_loaded) = p then OK_NoLoad = .;

if sum(pop_loaded) < p then OK_NoLoad = OK_NoLoad/(p-sum(pop_loaded));

* +------------------------------------------------+

Compute bias and MS Error in the factor loadings

* +------------------------------------------------+;

bias_loadings=j(p,k,0);

MSE_loadings=j(p,k,0);

do row =1 to p;

bias_loadings[row,]=L[row,]-lambda[row,];

MSE_loadings[row,]=(L[row,]-lambda[row,])##2;

end;

* +--------------------------------------------+

Compute pattern accuracy using the .30 thumb

* +--------------------------------------------+;

pattern_accuracy = j(p,k,0);

perfect_accuracy = 0;

pattern_accuracy30 = j(p,1,0);

perfect_accuracy30 = 0;

do variable = 1 to p;

NoLoad = 0;

do factor = 1 to k;

if (abs(Lambda[variable,factor]) >= .30 &

abs(L[variable,factor]) >= .30)

then do;

pattern_accuracy[variable,factor] = 1;

pattern_accuracy30[variable,1] = 1;

end;

if (abs(Lambda[variable,factor]) < .30 &

abs(L[variable,factor]) < .30) then do;

pattern_accuracy[variable,factor] = 1;

NoLoad = NoLoad + 1;

end;

end;

if NoLoad = k then pattern_accuracy30[variable,1] = 1;

end;

if mean(pattern_accuracy) = 1 then perfect_accuracy = 1;

if mean(pattern_accuracy30) = 1 then perfect_accuracy30 = 1;

finish;

*+_________________________________________________________________+

Subroutine to calculate coefficient of congruence - mean phi

+_________________________________________________________________+;

start get_phi(lambda,L,meanphi);

k = ncol(lambda);

num = vecdiag(lambda`*L);

den1 = vecdiag(lambda`*lambda);

den2 = vecdiag(L`*L);

phi_k = J(k,1,0);

meanphi = 0;

n_terms = 0;

do m = 1 to k;

if (den1[m,1]>0 & den2[m,1]>0) then do;

phi_k[m,1] = num[m,1] / sqrt(den1[m,1]#den2[m,1]);

meanphi = meanphi + phi_k[m,1];

n_terms = n_terms + 1;

end;

if (den1[m,1] = 0 | den2[m,1] = 0) then do;

phi_k[m,1] = 999;

end;

end;

meanphi = meanphi / n_terms;

finish;

start factor_scores(Lambda,L,sampdat,R_fscores,Non_zero);

*+________________________________________________________________________+

Compute Factor Score Estimates

+_______________________________________________________________________+;

p = nrow(L);

k = ncol(L);

SC =J(p,k,0);

SCP =J(p,k,0);

Non_zero = J(k,1,0);

do j = 1 to p;

do kk = 1 to k;

if L [j,kk] >= .3 then SC[j,kk]=1;

if L [j,kk] <= -.3 then SC[j,kk]=-1;

if Lambda [j,kk] >= .3 then do;

SCP[j,kk]=1;

Non_zero[kk,1] = 1;

end;

if Lambda [j,kk] <= -.3 then do;

SCP[j,kk]=-1;

Non_zero[kk,1] = 1;

end;

end;

end;

FSE = sampdat*SC;

FS = sampdat*SCP;

FS_FSE = FSE||FS;

R_matrix = corr(FS_FSE);

R_matrix2 = R_matrix[k+1:2#k,1:k];

R_fscores = vecdiag(R_matrix2);

finish;

Do pop_num = 1 to N_pops; * Loop for 10 populations;

run Make_PopR(p,k,commun_type,A1,R_pop);

Lambda = A1;

numr = r_pop[+,+] - p;

deno = r_pop[+,+];

ratio = numr/deno;

f2_pop = (p/(p-1))*ratio;

r2_pop = f2_pop/(1+f2_pop);

corr = r_pop;

seed1=round(1000000*rannor(0));

*nn = 100000;

chg = 1;

cycle = 0;

corr_tmp = corr;

do until (chg = 0);

run gendata2a(NN1,seed1,variance,1,0,0,means,corr_tmp,sim_data,p,d_frac);

sim_corr = corr(sim_data);

resid_m = sim_corr - corr;

tot_res = sum(abs(resid_m));

if cycle = 0 then do;

best_corr = corr_tmp;

best_res = tot_res;

end;

if cycle > 0 then do;

if tot_res < best_res then do;

best_corr = corr_tmp;

best_res = tot_res;

end;

end;

if tot_res < (.005#(((p-1)#p)/2)) then CHG = 0; * Convergence!;

if cycle > 30 then do;

if tot_res < (.01#(((p-1)#p)/2)) then CHG = 0; * Convergence!;

end;

if cycle > 200 then CHG = 0;

if CHG = 1 then corr_tmp = corr_tmp - resid_m; * adjust template and simulate another

large sample;

cycle = cycle + 1;

if CHG = 0 then do;

end;

end;

do S_Size =1 to 1; * Loop for sample sizes;

if S_Size = 1 then Sampsize2=3*P;

if S_Size = 2 then Sampsize2=2*p;

if S_Size = 3 then Sampsize2=p;

if S_Size = 4 then Sampsize2=4*p;

if S_Size=5 then Sampsize2=5*p;

                                                                * Loop for 1000 Samples;

**********************************************************:

* Begin data generation *;

**********************************************************;

seed1=round(1000000*ranuni(0));

nn2 = sampsize2;

corr_tmp = best_corr;

run gendata2b(NN2,seed1,variance,1,0,0,means,corr_tmp,sim_data,p,d_frac);

sampdat = sim_data;

r_samp=corr(sampdat);

 

print sampdat;

print A1;

 

CREATE SAMP FROM SAMPDAT ;APPEND FROM SAMPDAT; CLOSE SAMP;

 

close work.smap_ml;

SUBMIT;

                ods output OrthRotFactPat=samp_ml;

                proc factor data=WORK.samp method=ml priors=smc  rotate=varimax noint nfactors=2

                                norm=kaiser heywood  ;

                var col1 col2 col3 col4 col5 col6 col7 col8 col9 col10;

                run;

                proc print data=samp_ml;

                run;

ENDSUBMIT;

             

use work.samp_ml;                     /* open the data set */

read all var {"Factor1" "Factor2" }into L;           

close work.samp_ml;

run ACCURACY(L,Lambda,OK_Load,OK_NoLoad,bias_loadings,MSE_loadings,

pattern_accuracy,perfect_accuracy,

pattern_accuracy30,perfect_accuracy30);

 

run get_phi(lambda,L,meanphi);

run factor_scores(Lambda,L,sampdat,R_fscores,Non_zero);

if &rep = 1 then do;

 

_ok_load_m = ok_load;

_ok_Noload_m = ok_Noload;

_Bias_Loadings_m = Bias_Loadings;

_MSE_Loadings_m = MSE_Loadings;

_Pattern_Accuracy_m = Pattern_Accuracy;

_Perfect_Accuracy_m = Perfect_Accuracy;

_Pattern_Accuracy30_m = Pattern_Accuracy30;

_Perfect_Accuracy30_m = Perfect_Accuracy30;

_meanphi_m = meanphi;

_R_Fscores_m = R_Fscores;

_Non_zero_m = Non_zero;

 

if &rep > 1 then do;

 

_ok_load_m = _ok_load_m + ok_load;

_ok_Noload_m = _ok_Noload_m + ok_Noload;

_Bias_Loadings_m = _Bias_Loadings_m + Bias_Loadings;

_MSE_Loadings_m = _MSE_Loadings_m + MSE_Loadings;

_Pattern_Accuracy_m = _Pattern_Accuracy_m + Pattern_Accuracy;

_Perfect_Accuracy_m = _Perfect_Accuracy_m + Perfect_Accuracy;

_Pattern_Accuracy30_m = _Pattern_Accuracy30_m + Pattern_Accuracy30;

_Perfect_Accuracy30_m = _Perfect_Accuracy30_m + Perfect_Accuracy30;

_meanphi_m = _meanphi_m + meanphi;

_R_Fscores_m = _R_Fscores_m + R_Fscores;

_Non_zero_m = _Non_zero_m + Non_zero;

end;

if &rep = 1 then nsamples = 1;

if &rep > 1 then nsamples = nsamples +1;

                                                 *End Replications Loop*;

 

_ok_load_m = _ok_load_m/NSamples;

_ok_Noload_m = _ok_Noload_m/NSamples;

_Bias_Loadings_m = _BIas_Loadings_m/NSamples;

_MSE_Loadings_m = _MSE_Loadings_m/NSamples;

_Pattern_Accuracy_m = _Pattern_Accuracy_m/NSamples;

_Perfect_Accuracy_m = _Perfect_Accuracy_m/NSamples;

_Pattern_Accuracy30_m = _Pattern_Accuracy30_m/NSamples;

_Perfect_Accuracy30_m = _Perfect_Accuracy30_m/NSamples;

_meanphi_m = _meanphi_m/NSamples;

_R_Fscores_m = _R_Fscores_m/NSamples;

_Non_zero_m = _Non_zero_m/NSamples;

 

print Sampsize2 commun_type d_frac k p;

print  _OK_Load_m  _OK_NoLoad_m

 _Perfect_Accuracy30_m _Perfect_Accuracy_m _Meanphi_m;

print  _Pattern_Accuracy30_m  _R_Fscores_m  _Non_zero_m;

print _Bias_Loadings_m;

print  _MSE_Loadings_m;

print _Pattern_Accuracy_m;

end; end; end;

                                                 *End Sample Size Loop*;

                                 *End loop for 10 populations*;

quit;

 

stuti
Obsidian | Level 7

Able to modify my program.. now getting results for even repetation...

 

Thank you so much sir for all the guidence given to me...

 

still I have to do some modifications in program but may be now I am able to do it.

May be repetation is time consuming but still will be trying to improve it.

 

If in future would have any problem will come back once again

 

thank you sir.

Rick_SAS
SAS Super FREQ

I am glad that you fixed the problem. 

 

I notice that your program has many double loops in which you are iterating over rows and columns of a matrix.  When you have time, you might want to learn how to vectorize those loops. Some articles to read are:

- Vectorizing the construction of a structured matrix

- How to vectorize computations in a matrix language

 

As shown in the article "Timing performance improvements due to vectorization," vectorized code is much faster than looping over rows and columns.

stuti
Obsidian | Level 7

Hello sir

 

will definitely see try to vectorized the loops .

One more thing that I want to ask is that Suppose I am generating 1000 samples and on that I am applying PROC factor using following codes.

CREATE SAMP FROM SAMPDAT ;/* generated data*/;

APPEND FROM SAMPDAT; CLOSE SAMP;

 

close work.samp_ml;

                                                               

 SUBMIT;

                                                               

ods output OrthRotFactPat=samp_ml;

                                                          

proc factor data=WORK.samp method=ml priors=smc  rotate=varimax noint nfactors=2 norm=kaiser heywood  ;

var col1 col2 col3 col4 col5 col6 col7 col8 col9 col10;

run;

proc print data=samp_ml;

run;

                                                                          

ENDSUBMIT;

                                                               

                                                                             

use work.samp_ml;                     

read all var {"Factor1" "Factor2" }into L;           

close work.samp_ml;

run

 

Here in results I am getting all the outputs produced by PROC factor. I just need RotFactPattern  in further calculations which I am using as L. So is there any way out that these results are suppressed?

 

 

Rick_SAS
SAS Super FREQ

If you are generating 100 samples, you should write all 100 into a single data set, call PROC FACTOR one time and use the BY statement to conduct the 100 analyses. See the article "Simulation in SAS: The Slow way or the BY way"

 

For a working example, see "Simulate many samples from a logistic regression model"

 

To suppress  ODS output, see "Turn off ODS when running simulations in SAS"

 

If you are going to be doing simulations in SAS, it sounds like you might profit by obtaining a copy of Simulating Data with SAS. The book has gotten excellent reviews and is a best seller. It contains all these tips and more.

stuti
Obsidian | Level 7

sir,

I am generating 1000 different samples and on all 1000 different data I am applying proc factor.

the ouput says" The results required more than 4 MB memory and so it is not printed.

because in results it tries to prints for all 1000 sample say initial, roteted and all other results. I dont want any of such results. Just I have created ods output which is OrthFactPattern which is I am using to compare with population data. and after that I am taking average for all 1000 samples. In result, I just want to print that averages without any proc factor result. Is there any way out to do so?

Rick_SAS
SAS Super FREQ

The articles that I linked to show you how to suppress the ODS output. I strongly encourage you to get the program working perfectly for 5 samples before you attempt 1000.

stuti
Obsidian | Level 7

Hello sir

 

First I am just trying to repeat PROC FACTOR on  2 generated samples only. The issue is that in RESULT Viewer I am getting all the factor outputs eg., initial data, prior communality estimates, inital communality estimates, all iterations of communality etc. 

I have tried ODS EXCLUDE ALL but still the results were there.I have even tried %ODSOff but still no effect. So how to supress all that result? 

Rick_SAS
SAS Super FREQ

The %ODSOff macro in the blog post (and book) includes the statement:

ods noresults;

which causes the ODS results viewer to not add entries. If you are running an old version of SAS/IML, put the ODS statements INSIDE the SUBMIT/ENDSUBMIT block and before the PROC FACTOR call.

stuti
Obsidian | Level 7

Thank you so much sir,

%ODSOff inside the SUBMIT really worked properly and even it reduced the calculation time greatly.

even now I am able to run my program for any replications.

 

Thank you so much.

Because of all the help provided by you I am able to develop my own program.

still if i have any problem will come back again.

Rick_SAS
SAS Super FREQ

This might be a good time to mark this problem "Solved". If you have another problem, open a new thread.

stuti
Obsidian | Level 7

I am also attaching excel file as the sample data.

 

On that i ran my program and sas factor analysis using mle.

but the answers are different.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

From The DO Loop
Want more? Visit our blog for more articles like these.
Discussion stats
  • 28 replies
  • 3052 views
  • 4 likes
  • 2 in conversation