BookmarkSubscribeRSS Feed
anthonywang
Obsidian | Level 7

Hi SAS users:

 

I am really new to SAS macro word.

I have a code that is written using IML.

It does 3 thing:

1. simulate datasets;

2. fit the MIXED procedure, and save the parameters estimates to 3 datasets;

3. calculate a statistic using these three datasets.

 

So I tried to do a loop within the IML, but this does not really work.

I also tried to wrap up the whole code into a function, and try to call the function repeatedly, and it also does not work.

 

They all have the same error message as shown below.

So my question is what strategy should I use such that I can repeatedly call this whole code and simply store the statistic call "CvM"  in from each iteration into a vector?

It seems like in proc MIXED, the output datasets are in use after the initial call, is there a way to empty these datasets each time after calling the procedure  MIXED that is "submitted"?

 

Any comment and suggestion are so appreciated

proc iml;


/*1. simulate data*/

n=6;  SampSize = 100;  target = 7;  nfe = 4;  nre = 3;


/*TIMEPOINT = 1:4;*/
TIMEPOINT = 4; 




/*binary x_i*/
x_EDG = randfun(SampSize, "Bernoulli", 0.5); 
print x_EDG;

EDG = colvec(repeat(x_EDG,1,n));
print EDG;




/*fixed effect*/
beta = {21.4,1.92,-3.97,0.350}; 

time_i = {0,0,1,2,3,4};
timesqr_i = time_i##2; 
print timesqr_i;

time = shape(repeat(time_i,SampSize),SampSize*n,1);
timesqr = shape(repeat(timesqr_i,SampSize),SampSize*n,1) ;

/*design matrix X*/
intcp = j(SampSize*n,1,1);  
X = intcp || EDG || time || timesqr; 
X_design = X;
print X;

/*design matrix Z*/
Zi = repeat(1,n)||time_i||timesqr_i;
/*Zi = repeat(1,n)||time_i; */
Z = I(SampSize)@Zi;
Z_design = Z;

print Z;



	/*error term*/
	vc = 10*I(n);
	/*print vc;*/

	Mean= repeat(0,n);
	eps = RandNormal(SampSize, Mean, vc);
	eps = shape(eps,SampSize*n,1);
	/*print eps;*/



	mu = {0  0  0};
	cov = {10.4  0.279  -0.341,0.279  13.06  -2.466, -0.341  -2.466  0.581};  
/*	rnd = RANDNORMAL(SampSize, mu, cov);*/
	rnd = RandMVT(SampSize, 3, mu, cov );

	rndA = shape(rnd, SampSize*3,1);

 
 

	Y = X*beta+Z*rndA+eps;
	/*print Y;*/





/*Create ID vectors*/
ID = colvec(repeat(T(1:SampSize),1,n)) ;
print ID;


dataset = ID||Y||EDG||time||timesqr;
print dataset;



create MyData from dataset [colname={"id", "Y", "EDG", "time"}];
append from dataset;
close MyData;

/********************************************************/

/*2. fit mixed*/
submit MyData;
ods listing close;
ods output SolutionF = fixed  SolutionR = rnds  CovParms=CovP;
proc mixed data=MyData method=ml COVTEST;
	  class id;
      model Y = EDG time time*time / solution cl covb;
	  random intercept time time*time / SUB=id TYPE=un SOLUTION G GCORR V;
run;
ods listing;
endsubmit;
/********************************************************/



/*3. calculate CvM statistic*/

use rnds;
    read all into rnds[colname=varNames];


intercept_idx= do(1,SampSize*nre,3); 
print intercept_idx;
time_idx     = do(2,SampSize*nre,3); 
print time_idx;
timesqr_idx  = do(3,SampSize*nre,3); 
print timesqr_idx;

EB_intercept= rnds[intercept_idx,"Estimate"];
EB_time     = rnds[time_idx,"Estimate"];
EB_timesqr  = rnds[timesqr_idx,"Estimate"];

print EB_intercept,EB_time,EB_timesqr;

use Fixed; 
	read all var {Estimate} into EB_FIXED;

use CovP;  
	read all var {Estimate} into CovPars;
 






resid_var = CovPars[nrow(CovPars)];

print resid_var;


EB_RND = EB_intercept||EB_time||EB_timesqr;

shape_EB_RND = shape(EB_RND,SampSize*nre,1);

mu = X_design*EB_FIXED + Z_design*shape_EB_RND;

print mu;

Benefit = probnorm( (target - (X_design*EB_FIXED + Z_design*shape_EB_RND)) / sqrt(resid_var) ) ; 

Benefit_all = id || EDG || time || Benefit;

print Benefit_all;

/**********************************************************************************************************************/
/**********************************GROUP THE BENEFITS by x_i and at TIMEPOINT******************************************/

edg_rows_t   = loc((Benefit_all[,2]=1) & (Benefit_all[,3]=TIMEPOINT));
noedg_rows_t = loc((Benefit_all[,2]=0) & (Benefit_all[,3]=TIMEPOINT));


e_ct   = countn(edg_rows_t) ;  
NOe_ct = countn(noedg_rows_t) ; 
print e_ct, NOe_ct;
 
/*obtain EB benefits at TIMEPOINT*/
edg_t   = Benefit_all[edg_rows_t,ncol(Benefit_all)];
noedg_t = Benefit_all[noedg_rows_t,ncol(Benefit_all)];
print edg_t;
print noedg_t;
  
/****************************************exclude benefits 0 and 1**********************************************/

smallCutoff = 1e-6;

LargeCutoff = 1 - smallCutoff;

idx_e =  loc((edg_t < smallCutoff ) | (edg_t > LargeCutoff));
idx_ne = loc((noedg_t < smallCutoff ) | (noedg_t > LargeCutoff));

print idx_e,idx_ne ;


if ncol(idx_e)>0 then do;
   edg_t = edg_t[setdif(1:nrow(edg_t), idx_e)];
   e_ct  = countn(edg_t) ;  
end;
else print "There are no positive values";


if ncol(idx_ne)>0 then do;
   noedg_t = noedg_t[setdif(1:nrow(noedg_t),idx_ne)];
   NOe_ct  = countn(noedg_t) ;
end;
else print "There are no positive values";


 
print e_ct, NOe_ct;
print edg_t, noedg_t;


/**********************************************************************************************************************/



/*mean function of time and binary covariates*/
start Mut (LB, x_EDG, t, Fixed_Efft);

	tsqr = t*t;
	X_LB = 1 || x_EDG || t || tsqr;
	LB = X_LB*Fixed_Efft;

finish Mut; 


 



/*variance function of time and binary covariates*/
start Vart (LB_VAR, t, Varparms);

	tsqr = t*t;
	tcub = t*t*t;
	tfor = t*t*t*t;
	Xvar_LB = 1 || 2*t || tsqr || 2*tsqr || 2*tcub || tfor;
	LB_VAR = Xvar_LB*Varparms[1:(nrow(Varparms)-1)];

finish Vart;





/*CDF*/
start CDF_b (F_z_cdf,  x_EDG,Fixed_Efft,Varparms, t, z, target);


	call Mut (LB, x_EDG, t, Fixed_Efft);
	sigma_sqr = Varparms[nrow(Varparms)];
	mean_t = (target - LB)/sqrt(sigma_sqr);
/*	print mean_t;*/

	call Vart (LB_VAR, t,Varparms);
/*	print LB_VAR;*/

	gamma_sqr = LB_VAR/sigma_sqr;
/*	print gamma_sqr;*/

	F_z = quantile('NORMAL',z);
	F_z_t = (F_z - mean_t)/sqrt(gamma_sqr);
	F_z_cdf = cdf('NORMAL',F_z_t);
/*	print F_z, F_z_t, F_z_cdf;*/

finish CDF_b;






start CRM_stat (CRM, dataset);

/*obtain order statistics by sorting the CDFs*/
	call sort(dataset, 1);

	cons = j(nrow(dataset),1,0);
	do i = 1 to nrow(dataset);
		cons[i] = (2*i-1)/ (2*nrow(dataset));
	end;
/*	cons1 = dataset - cons;*/
/*	cons2 = (dataset - cons)##2;*/
/*	cons3 = sum((dataset - cons)##2);*/

	CRM = 1/(12*nrow(dataset)*nrow(dataset)) + 1/(nrow(dataset)) * sum((dataset - cons)##2);
/*	print cons, cons1, cons2, cons3, CRM;*/

finish CRM_stat;




/**********************************************************************************************************************/
 



edg_t_cdf = j(nrow(edg_t), 1, 0);
do i = 1 to nrow(edg_t);
	call CDF_b (F_z_cdf,  1,EB_FIXED,CovPars, TIMEPOINT, edg_t[i],target);
	edg_t_cdf[i] = F_z_cdf;
end;


noedg_t_cdf = j(nrow(noedg_t), 1, 0);
do i = 1 to nrow(noedg_t);
	call CDF_b (F_z_cdf,  0,EB_FIXED,CovPars, TIMEPOINT, noedg_t[i],target);
	noedg_t_cdf[i] = F_z_cdf;
end;


print edg_t_cdf, noedg_t_cdf;
 


/**********************************************************************************************************************/
/*calculate the statistic*/

call CRM_stat (edg_CRM, edg_t_cdf);
call CRM_stat (noedg_CRM, noedg_t_cdf);
print edg_CRM, noedg_CRM;


CvM = (e_ct*edg_CRM + NOe_ct*noedg_CRM)/(e_ct+NOe_ct);
print CvM;




Capture.PNG

2 REPLIES 2
Rick_SAS
SAS Super FREQ

The error messages are telling you that you have opened the data sets in IML but did not close them. Therefore when you try to open the data sets in the SUBMIT block, you get the errors.

 

In general, it is good programming practice to close the data sets when you are done.

In your case, just do the following:

 

use rnds;
    read all into rnds[colname=varNames];
CLOSE;

...

use Fixed; 
	read all var {Estimate} into EB_FIXED;
CLOSE;

use CovP;  
	read all var {Estimate} into CovPars;
CLOSE;

The bigger question you ask is "how to loop." I advise that you do not use a macro loop to perform a simulation. Instead, use BY-group processing. The basic idea is to generate the data for each of the B simulation sample and write it to on big data set that has an ID variable with B unique values. Then run PROC MIXED one time and use the BY statement to get B sets of statistics. Then analyze the statistcs.

 

You can find many examples on my blog, papers, and the book Simulating Data with SAS. For example, see Tips 6-9 in Wicklin (2015), "Ten Tips for Simulating Data with SAS." See also "Simulate many samples from a logistic regression model", which is close to your PROC MIXED example.

anthonywang
Obsidian | Level 7
These suggestions are very helpful.
That you Rick!

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

From The DO Loop
Want more? Visit our blog for more articles like these.
Discussion stats
  • 2 replies
  • 739 views
  • 0 likes
  • 2 in conversation