Re: Can I iterate the whole code within IML ?

anthonywang · Posted 04-11-2019 01:09 AM

Hi SAS users:

I am really new to SAS macro word.

I have a code that is written using IML.

It does 3 thing:

1. simulate datasets;

2. fit the MIXED procedure, and save the parameters estimates to 3 datasets;

3. calculate a statistic using these three datasets.

So I tried to do a loop within the IML, but this does not really work.

I also tried to wrap up the whole code into a function, and try to call the function repeatedly, and it also does not work.

They all have the same error message as shown below.

So my question is what strategy should I use such that I can repeatedly call this whole code and simply store the statistic call "CvM" in from each iteration into a vector?

It seems like in proc MIXED, the output datasets are in use after the initial call, is there a way to empty these datasets each time after calling the procedure MIXED that is "submitted"?

Any comment and suggestion are so appreciated

proc iml;


/*1. simulate data*/

n=6;  SampSize = 100;  target = 7;  nfe = 4;  nre = 3;


/*TIMEPOINT = 1:4;*/
TIMEPOINT = 4; 




/*binary x_i*/
x_EDG = randfun(SampSize, "Bernoulli", 0.5); 
print x_EDG;

EDG = colvec(repeat(x_EDG,1,n));
print EDG;




/*fixed effect*/
beta = {21.4,1.92,-3.97,0.350}; 

time_i = {0,0,1,2,3,4};
timesqr_i = time_i##2; 
print timesqr_i;

time = shape(repeat(time_i,SampSize),SampSize*n,1);
timesqr = shape(repeat(timesqr_i,SampSize),SampSize*n,1) ;

/*design matrix X*/
intcp = j(SampSize*n,1,1);  
X = intcp || EDG || time || timesqr; 
X_design = X;
print X;

/*design matrix Z*/
Zi = repeat(1,n)||time_i||timesqr_i;
/*Zi = repeat(1,n)||time_i; */
Z = I(SampSize)@Zi;
Z_design = Z;

print Z;



	/*error term*/
	vc = 10*I(n);
	/*print vc;*/

	Mean= repeat(0,n);
	eps = RandNormal(SampSize, Mean, vc);
	eps = shape(eps,SampSize*n,1);
	/*print eps;*/



	mu = {0  0  0};
	cov = {10.4  0.279  -0.341,0.279  13.06  -2.466, -0.341  -2.466  0.581};  
/*	rnd = RANDNORMAL(SampSize, mu, cov);*/
	rnd = RandMVT(SampSize, 3, mu, cov );

	rndA = shape(rnd, SampSize*3,1);

 
 

	Y = X*beta+Z*rndA+eps;
	/*print Y;*/





/*Create ID vectors*/
ID = colvec(repeat(T(1:SampSize),1,n)) ;
print ID;


dataset = ID||Y||EDG||time||timesqr;
print dataset;



create MyData from dataset [colname={"id", "Y", "EDG", "time"}];
append from dataset;
close MyData;

/********************************************************/

/*2. fit mixed*/
submit MyData;
ods listing close;
ods output SolutionF = fixed  SolutionR = rnds  CovParms=CovP;
proc mixed data=MyData method=ml COVTEST;
	  class id;
      model Y = EDG time time*time / solution cl covb;
	  random intercept time time*time / SUB=id TYPE=un SOLUTION G GCORR V;
run;
ods listing;
endsubmit;
/********************************************************/



/*3. calculate CvM statistic*/

use rnds;
    read all into rnds[colname=varNames];


intercept_idx= do(1,SampSize*nre,3); 
print intercept_idx;
time_idx     = do(2,SampSize*nre,3); 
print time_idx;
timesqr_idx  = do(3,SampSize*nre,3); 
print timesqr_idx;

EB_intercept= rnds[intercept_idx,"Estimate"];
EB_time     = rnds[time_idx,"Estimate"];
EB_timesqr  = rnds[timesqr_idx,"Estimate"];

print EB_intercept,EB_time,EB_timesqr;

use Fixed; 
	read all var {Estimate} into EB_FIXED;

use CovP;  
	read all var {Estimate} into CovPars;
 






resid_var = CovPars[nrow(CovPars)];

print resid_var;


EB_RND = EB_intercept||EB_time||EB_timesqr;

shape_EB_RND = shape(EB_RND,SampSize*nre,1);

mu = X_design*EB_FIXED + Z_design*shape_EB_RND;

print mu;

Benefit = probnorm( (target - (X_design*EB_FIXED + Z_design*shape_EB_RND)) / sqrt(resid_var) ) ; 

Benefit_all = id || EDG || time || Benefit;

print Benefit_all;

/**********************************************************************************************************************/
/**********************************GROUP THE BENEFITS by x_i and at TIMEPOINT******************************************/

edg_rows_t   = loc((Benefit_all[,2]=1) & (Benefit_all[,3]=TIMEPOINT));
noedg_rows_t = loc((Benefit_all[,2]=0) & (Benefit_all[,3]=TIMEPOINT));


e_ct   = countn(edg_rows_t) ;  
NOe_ct = countn(noedg_rows_t) ; 
print e_ct, NOe_ct;
 
/*obtain EB benefits at TIMEPOINT*/
edg_t   = Benefit_all[edg_rows_t,ncol(Benefit_all)];
noedg_t = Benefit_all[noedg_rows_t,ncol(Benefit_all)];
print edg_t;
print noedg_t;
  
/****************************************exclude benefits 0 and 1**********************************************/

smallCutoff = 1e-6;

LargeCutoff = 1 - smallCutoff;

idx_e =  loc((edg_t < smallCutoff ) | (edg_t > LargeCutoff));
idx_ne = loc((noedg_t < smallCutoff ) | (noedg_t > LargeCutoff));

print idx_e,idx_ne ;


if ncol(idx_e)>0 then do;
   edg_t = edg_t[setdif(1:nrow(edg_t), idx_e)];
   e_ct  = countn(edg_t) ;  
end;
else print "There are no positive values";


if ncol(idx_ne)>0 then do;
   noedg_t = noedg_t[setdif(1:nrow(noedg_t),idx_ne)];
   NOe_ct  = countn(noedg_t) ;
end;
else print "There are no positive values";


 
print e_ct, NOe_ct;
print edg_t, noedg_t;


/**********************************************************************************************************************/



/*mean function of time and binary covariates*/
start Mut (LB, x_EDG, t, Fixed_Efft);

	tsqr = t*t;
	X_LB = 1 || x_EDG || t || tsqr;
	LB = X_LB*Fixed_Efft;

finish Mut; 


 



/*variance function of time and binary covariates*/
start Vart (LB_VAR, t, Varparms);

	tsqr = t*t;
	tcub = t*t*t;
	tfor = t*t*t*t;
	Xvar_LB = 1 || 2*t || tsqr || 2*tsqr || 2*tcub || tfor;
	LB_VAR = Xvar_LB*Varparms[1:(nrow(Varparms)-1)];

finish Vart;





/*CDF*/
start CDF_b (F_z_cdf,  x_EDG,Fixed_Efft,Varparms, t, z, target);


	call Mut (LB, x_EDG, t, Fixed_Efft);
	sigma_sqr = Varparms[nrow(Varparms)];
	mean_t = (target - LB)/sqrt(sigma_sqr);
/*	print mean_t;*/

	call Vart (LB_VAR, t,Varparms);
/*	print LB_VAR;*/

	gamma_sqr = LB_VAR/sigma_sqr;
/*	print gamma_sqr;*/

	F_z = quantile('NORMAL',z);
	F_z_t = (F_z - mean_t)/sqrt(gamma_sqr);
	F_z_cdf = cdf('NORMAL',F_z_t);
/*	print F_z, F_z_t, F_z_cdf;*/

finish CDF_b;






start CRM_stat (CRM, dataset);

/*obtain order statistics by sorting the CDFs*/
	call sort(dataset, 1);

	cons = j(nrow(dataset),1,0);
	do i = 1 to nrow(dataset);
		cons[i] = (2*i-1)/ (2*nrow(dataset));
	end;
/*	cons1 = dataset - cons;*/
/*	cons2 = (dataset - cons)##2;*/
/*	cons3 = sum((dataset - cons)##2);*/

	CRM = 1/(12*nrow(dataset)*nrow(dataset)) + 1/(nrow(dataset)) * sum((dataset - cons)##2);
/*	print cons, cons1, cons2, cons3, CRM;*/

finish CRM_stat;




/**********************************************************************************************************************/
 



edg_t_cdf = j(nrow(edg_t), 1, 0);
do i = 1 to nrow(edg_t);
	call CDF_b (F_z_cdf,  1,EB_FIXED,CovPars, TIMEPOINT, edg_t[i],target);
	edg_t_cdf[i] = F_z_cdf;
end;


noedg_t_cdf = j(nrow(noedg_t), 1, 0);
do i = 1 to nrow(noedg_t);
	call CDF_b (F_z_cdf,  0,EB_FIXED,CovPars, TIMEPOINT, noedg_t[i],target);
	noedg_t_cdf[i] = F_z_cdf;
end;


print edg_t_cdf, noedg_t_cdf;
 


/**********************************************************************************************************************/
/*calculate the statistic*/

call CRM_stat (edg_CRM, edg_t_cdf);
call CRM_stat (noedg_CRM, noedg_t_cdf);
print edg_CRM, noedg_CRM;


CvM = (e_ct*edg_CRM + NOe_ct*noedg_CRM)/(e_ct+NOe_ct);
print CvM;

Rick_SAS · Posted 04-11-2019 05:21 AM

The error messages are telling you that you have opened the data sets in IML but did not close them. Therefore when you try to open the data sets in the SUBMIT block, you get the errors.

In general, it is good programming practice to close the data sets when you are done.

In your case, just do the following:

use rnds;
    read all into rnds[colname=varNames];
CLOSE;

...

use Fixed; 
	read all var {Estimate} into EB_FIXED;
CLOSE;

use CovP;  
	read all var {Estimate} into CovPars;
CLOSE;

The bigger question you ask is "how to loop." I advise that you do not use a macro loop to perform a simulation. Instead, use BY-group processing. The basic idea is to generate the data for each of the B simulation sample and write it to on big data set that has an ID variable with B unique values. Then run PROC MIXED one time and use the BY statement to get B sets of statistics. Then analyze the statistcs.

You can find many examples on my blog, papers, and the book Simulating Data with SAS. For example, see Tips 6-9 in Wicklin (2015), "Ten Tips for Simulating Data with SAS." See also "Simulate many samples from a logistic regression model", which is close to your PROC MIXED example.

anthonywang · Posted 04-11-2019 01:51 PM

These suggestions are very helpful.
That you Rick!

Can I iterate the whole code within IML ?

Re: Can I iterate the whole code within IML ?

Re: Can I iterate the whole code within IML ?

Registration is open