I used proc iml to generate data for both x from N(0,1) and Y from logistic regression model for 1000 times and sample size =25 for each replication, where beta_0=1 and beta_1=3.
I used the generated data to calculate the MLE for unknown coefficients of logit regression that includes only one predictor (x) by minimizing the negative log likelihood function. I used initial values {0 1} for beta_0 and beta_1 optimization procedure. The following is the code:
proc iml;
/*Module for calculating the log-likelihood function for logistic regression and returns negative log likelihood*/
start SUMLOGLHF(beta) global(n,x,y);
z = x*t(beta);
z = max(-700, min(700, z));/*Prevents exp() overflow/underflow*/
prob=exp(z)/(1+exp(z));
llikelihood=sum(y#log(prob+1e-12) + (1-y)#log(1-prob+1e-12));/*Added 1e-12 to avoid log (0)*/
return (-llikelihood);/*Return negative log likelihood function*/
finish;
* Module to calculate gradient;
start driv(beta) global(n,x,y);
z = x*t(beta);
z = max(-700, min(700, z));/*Prevents exp() overflow/underflow*/
prob=exp(z)/(1+exp(z));
g=t(-t(x)*(y-prob));/* g is 1 by 2 vector*/
return(g);
finish;
/*Module to calculate Hessian Matrix*/
start HessLogLik(beta) global(n,x,y);
z = x*t(beta);
z = max(-700, min(700, z));/*Prevents exp() overflow/underflow*/
prob=exp(z)/(1+exp(z));
weight = prob#(1-prob);/*Var(Y)*/
/*Hessian*/
h = (t(x)*(diag(weight))*x); /* Simplified Hessian calculation */
h = h + I(2)*1e-4; /*Prevents the inversion of a singular matrix when the Hessian is not positive definite*/
return(h);
finish;
replicate=1000;/*1000 replications*/
n=25;/*Sample size for each replicate is 25*/
Est=j(replicate,2,0);/*Matrix that includes the estimates of beta_0 and beta_1*/
colMean=j(2,1,0);/*Vector that includes the mean values of the estimates of beta_0 and beta_1*/
colStanddev=j(2,1,0);/*Vector that includes the standard deviation values of the estimates of beta_0 and beta_1*/
/*Do loop for 1000 times*/
do k=1 to replicate;
x1=j(1,n);
x1t=t(x1);
call randseed(3205+k);/*Change seed each iteration by adding k to avoid getting the same result 1000 times*/
Call randgen(x1t, "NORMAL", 0, 1); /*Generates N(0,1)*/
x2=j(n,1,1);
x=x2||x1t;/*x is the design matrix*/
y=j(n,1);/*y is the vector that includes the generated y data*/
TruePar={1 3};/*Row vector that includes the true values for beta_0 and beta_1*/
/*Calculate ptrue which is the cdf of logit and use it to generate y data */
ptrue=exp(x*t(TruePar))/(1+exp(x*t(TruePar)));/* ptrue is based on Logistic (Sigmoid) distribution */
Call randgen(y,"Bernoulli",ptrue);/*Generates Bernoulli distribution for y with probability of success =ptrue*/
sumy=sum(y);
/*Initial_values for beta_0 and beta_1 is represented as a 1x2 row vector for initial values*/
initial_param={0 1};
opt={0};/*To get the estimates that minimize the negative log likelihood function introduced in the module*/
/* Call Optimizer */
call nlpnra(rc, res, "SUMLOGLHF", initial_param,opt, , , , ,"driv","HessLogLik");/*Use Newton Raphson's algorithm and the nlpnra routine that minimizes negative log likelihood function*/
Est[k,]=res;/*Matrix that includes 1000 estimates of beta_0_hat and beta_1_ hat*/
end;
print Est;
colMean=t(mean(Est));
print colMean;
colStanddev=t(std(Est));
print colStanddev;
quit;
I got a mean of beta_0_hat and beta_1_hat that are not close to the true values of beta_0 and beta_1, which are 1 and 3 respectively.
I got the following errors and notes in the log window:
ERROR: NEWRAP Optimization cannot be completed.
ERROR: NEWRAP needs more than 200 iterations or 500 function calls.
NOTE: FCONV convergence criterion satisfied.
NOTE: At least one element of the (projected) gradient is greater than 1e-3.
How can I fix the code to get rid of the errors and get the MLEs of beta_0 and beta_1 in a correct way?
Thank you!
First, I made a mistake. The statement that subsets the good indices should be
Est = Est[good_idx, ]; /* Note the comma! */
To answer your questions:
> 1. the mean of estimates of betas are far away from the true values. Is it acceptable?
I don't know what "acceptable" means. What is your goal? Your estimates will be far away from the true mean regardless of which optimization method you use.
If your goal is to show that the regression estimates for this model are highly variable and should not be trusted for samples of size n=25, then, yes, I think this simulation demonstrates that fact very well. If you plot your parameter estimates, there are many extreme values, which will, of course, strongly influence the Monte Carlo mean and standard deviation. A more robust Monte Carlo estimate would be the median of the columns. You should always graph the parameter estimates to visualize their variability.
colMedian=t(median(Est));
print colMedian;
title "Graph of parameter estimates from NLPNRA";
call scatter(Est[,1], Est[,2]);
The graph might cause you to ask whether the computations in the simulation are wrong. You can answer that question by performing the same computation for larger sample sizes. Many of your "problems" with nonconvergence and highly variable estimates will go away if you increase the sample size, to, say, n=100.
2. Yes, the Monte Carlo estimate is based on the 997 estimates that converged
3. Yes, you can increase the value of the 'replicate' variable to obtain additional Monte Carlo samples. But the math doesn't care if you use 997 or 1000. The math says that the mean of the estimates (however many you have) is the Monte Carlo estimate.
In summary, your "problems" are caused because the sample size in the simulation is so small. You can "fix" your problems by using a larger sample size.
There's a board dedicated to SAS/IML.
I will move this topic thread to that board.
Koen
** DONE **
The statement
z = max(-700, min(700, z));/*Prevents exp() overflow/underflow*/
is wrong.
The MIN and the MAX function return a scalar value. So instead of z being a vector, it is a scalar.
The correct way to clip a vector of values into an interval is shown in the article
"Clip values into an interval"
The article contains this little IML helper function:
/* use the elementwise minimum (><) and elementwise maximum (<>) operators to truncate elements of a vector into an interval. See
https://blogs.sas.com/content/iml/2026/02/04/clip-values.html
*/ start Clip(x, ab); a = ab[1]; b = ab[2]; return( a <> (x >< b) ); finish;
Use that function instead of your nested MAX-MIN statement. For example:
start SUMLOGLHF(beta) global(n,x,y);
z = x*t(beta);
z = Clip(z, {-700 700}); /*Prevents exp() overflow/underflow*/
...
There might be other issues, but fix this issue first.
Thank you for your response.
I did what you recommended, and the following is the code:
proc iml;
/* Define Clip module */
start Clip(x, ab);
a = ab[1]; b = ab[2];
return( a <> (x >< b) );
finish;
/*Module for calculating the log-likelihood function for logistic regression and returns negative log likelihood*/
start SUMLOGLHF(beta) global(n,x,y);
z = x*t(beta);
z = Clip(z, {-700 700}); /*Prevents exp() overflow/underflow*/
prob=exp(z)/(1+exp(z));
llikelihood=sum(y#log(prob+1e-12) + (1-y)#log(1-prob+1e-12));/*Added 1e-12 to avoid log (0)*/
return (-llikelihood);/*Return negative log likelihood function*/
finish;
* Module to calculate gradient;
start driv(beta) global(n,x,y);
z = x*t(beta);
z = Clip(z, {-700 700}); /*Prevents exp() overflow/underflow*/
prob=exp(z)/(1+exp(z));
g=t(-t(x)*(y-prob));/* g is 1 by 2 vector*/
return(g);
finish;
/*Module to calculate Hessian Matrix*/
start HessLogLik(beta) global(n,x,y);
z = x*t(beta);
z = Clip(z, {-700 700}); /*Prevents exp() overflow/underflow*/
prob=exp(z)/(1+exp(z));
weight = prob#(1-prob);/*Var(Y)*/
/*Hessian matrix (second derivative matrix)*/
h = (t(x)*(diag(weight))*x); /* Simplified Hessian calculation */
h = h + I(2)*1e-4; /*Prevents the inversion of a singular matrix when the Hessian is not positive definite*/
return(h);
finish;
replicate=1000;/*1000 replications*/
n=25;/*Sample size for each replicate is 25*/
Est=j(replicate,2,0);/*Matrix that includes the estimates of beta_0 and beta_1*/
colMean=j(2,1,0);/*Vector that includes the mean values of the estimates of beta_0 and beta_1*/
colStanddev=j(2,1,0);/*Vector that includes the standard deviation values of the estimates of beta_0 and beta_1*/
/*Do loop for 1000 times*/
do k=1 to replicate;
x1=j(1,n);
x1t=t(x1);
call randseed(3205+k);/*Change seed each iteration by adding k to avoid getting the same result 1000 times*/
Call randgen(x1t, "NORMAL", 0, 1); /*Generates N(0,1)*/
x2=j(n,1,1);
x=x2||x1t;/*x is the design matrix*/
y=j(n,1);/*y is the vector that includes the generated y data*/
TruePar={1 3};/*Row vector that includes the true values for beta_0 and beta_1*/
/*Calculate ptrue which is the cdf of logit and use it to generate y data */
ptrue=exp(x*t(TruePar))/(1+exp(x*t(TruePar)));/* ptrue is based on Logistic distribution */
Call randgen(y,"Bernoulli",ptrue);/*Generates Bernoulli distribution for y with probability of success =ptrue*/
/*Initial_values for beta_0 and beta_1 is represented as a 1x2 row vector for initial values*/
initial_param={0 1};
/* Set max iterations to 500 and max function calls to 1000*/
opt = j(1, 10, .);
opt[1] = 0; /* Minimize */
opt[2] = 1; /* Change to 1 or 3 to print messages to log if needed */
opt[5] = 500; /* Increased Max Iterations */
opt[6] = 1000; /* Increased Max Function Calls */
/* Call Optimizer */
call nlpnra(rc, res, "SUMLOGLHF", initial_param,opt, , , , ,"driv","HessLogLik");/*Use Newton Raphson's algorithm and the nlpnra routine that minimizes negative log likelihood function*/
Est[k,]=res;/*Matrix that includes 1000 estimates of beta_0_hat and beta_1_ hat*/
end;
colMean=t(mean(Est));
print colMean;
colStanddev=t(std(Est));
print colStanddev;
quit;
I got the following error 3 times in the log window when n=25:
ERROR: NEWRAP Optimization cannot be completed.
ERROR: NEWRAP needs more than 200 iterations or 500 function calls.
I defined opt as follows to get rid of this error:
opt = j(1, 10, .);
opt[1] = 0; /* Minimize */
opt[2] = 1; /* Change to 1 or 3 to print messages to log if needed */
opt[5] = 500; /* Increased Max Iterations */
opt[6] = 1000; /* Increased Max Function Calls */However, I am still getting the same error. How can I fix this?
> I got the following error 3 times in the log window when n=25:
ERROR: NEWRAP Optimization cannot be completed.
ERROR: NEWRAP needs more than 200 iterations or 500 function calls.
How can I fix this?
This error message appears for 3 out of 1000 simulation trials. Errors like this appear in simulation studies when a random sample contains extreme values or is otherwise not representative. For those samples, the MLE step does not converge. This does not indicate a problem in your code, but rather a hazard when you are simulating small data sets (in your example, n=25). For details and a discussion, see Monitor convergence during simulation studies in SAS - The DO Loop
There are two ways to address your concern:
rc_status = j(replicate,1);
do k=1 to replicate;
...
call nlpnra(rc, res, "SUMLOGLHF", initial_param, opt) grd="driv" hes="HessLogLik";
Est[k,]=res;
rc_status[k] = rc;
end;
/* find trials that converged */
good_idx = loc(rc_status > 0);
Est = Est[good_idx, ];
nSim = nrow(Est);
colMean=t(mean(Est));
colStanddev=t(std(Est));
print nSim colMean colStanddev;
Thank you for response.
I have the following three questions.
First, I made a mistake. The statement that subsets the good indices should be
Est = Est[good_idx, ]; /* Note the comma! */
To answer your questions:
> 1. the mean of estimates of betas are far away from the true values. Is it acceptable?
I don't know what "acceptable" means. What is your goal? Your estimates will be far away from the true mean regardless of which optimization method you use.
If your goal is to show that the regression estimates for this model are highly variable and should not be trusted for samples of size n=25, then, yes, I think this simulation demonstrates that fact very well. If you plot your parameter estimates, there are many extreme values, which will, of course, strongly influence the Monte Carlo mean and standard deviation. A more robust Monte Carlo estimate would be the median of the columns. You should always graph the parameter estimates to visualize their variability.
colMedian=t(median(Est));
print colMedian;
title "Graph of parameter estimates from NLPNRA";
call scatter(Est[,1], Est[,2]);
The graph might cause you to ask whether the computations in the simulation are wrong. You can answer that question by performing the same computation for larger sample sizes. Many of your "problems" with nonconvergence and highly variable estimates will go away if you increase the sample size, to, say, n=100.
2. Yes, the Monte Carlo estimate is based on the 997 estimates that converged
3. Yes, you can increase the value of the 'replicate' variable to obtain additional Monte Carlo samples. But the math doesn't care if you use 997 or 1000. The math says that the mean of the estimates (however many you have) is the Monte Carlo estimate.
In summary, your "problems" are caused because the sample size in the simulation is so small. You can "fix" your problems by using a larger sample size.
Thank you, Rick!
Dive into keynotes, announcements and breakthroughs on demand.
Explore Now →