I am trying to estimate a mixture model with 2 components, one of which is normal and the other is lognormal, using PROC NLMIXED. When I simulate data from the model and try to estimate the model, NLMIXED gives an "Execution error for observation 1" error message and stops processing. I've tried troubleshooting by looking for numerical issues using a data step, but I don't get any errors in the corresponding data step. Any thoughts on how to approach troubleshooting this would be appreciated. Thanks.
*Simulate data;
%let pi0 = 0.8; *Mixing probabilities;
%let pi1 = 1-&pi0;
%let mu0 = 0; *Mean of normal component 0;
%let sigma0 = 0.005; *Standard deviation of normal component 0;
%let m1 = 0.02; *The expected value of lognormal component 1;
%let s1 = 0.0025; *The standard deviation of lognormal component 1;
%let mu1 = log((&m1**2)/sqrt(&s1**2 + &m1**2));
%let sigma1 = sqrt(log(1 + (&s1**2)/(&m1**2)));
data simdata;
call streaminit(123);
do i = 1 to 1000;
component = rand('bernoulli',&pi1);
if component=0 then x = rand('normal',&mu0,&sigma0);
else if component=1 then x = exp(rand('normal',&mu1,&sigma1));
output;
end;
run;
*Estimate the model;
proc nlmixed data=simdata fd=central technique=congra update=pb hessian;
parms a1 = -3.91, pi0=0.8, sigma0=0.005, sigma1=0.1245;
bounds 0 <= pi0 <= 1, 0 < sigma0, 0 < sigma1;
temp1= pi0*pdf('normal',x, 0, sqrt(sigma0**2));
temp2=(1-pi0)*pdf('lognormal',x, a1, sqrt(sigma1**2));
temp = temp1 + temp2;
if temp = 0 then temp = 1E-300;
loglik = log(temp);
pi1 = 1-pi0;
ln_mean = exp(a1+0.5*sigma1**2);
ln_sd = sqrt((exp(sigma1**2) - 1)*exp(2*a1 + sigma1**2));
model x~general(loglik);
estimate 'alpha1' a1;
estimate 'pi0' pi0;
estimate 'sigma0' sigma0;
estimate 'sigma1' sigma1;
estimate 'pi1' pi1;
estimate 'ln_mean' ln_mean;
estimate 'ln_sd' ln_sd;
run;
*Check log-likelihood for potential errors;
data llcheck;
set simdata;
a1 = -3.91; pi0=0.8; sigma0=0.005; sigma1=0.1245;
temp1= pi0*pdf('normal',x, 0, sqrt(sigma0**2));
temp2=(1-pi0)*pdf('lognormal',x, a1, sqrt(sigma1**2));
temp = temp1 + temp2;
if temp = 0 then temp = 1E-300;
loglik = log(temp);
pi1 = 1-pi0;
ln_mean = exp(a1+0.5*sigma1**2);
ln_sd = sqrt((exp(sigma1**2) - 1)*exp(2*a1 + sigma1**2));
run;
from experience id say it can be very sensitive to the initial estimates in the parms statement. But check some basic stats on temp variable eg max, min
Thanks, Paul. I've had the same experience, but what was strange is that it wouldn't evaluate the functions within NLMIXED but would within a data step, even for the same values passed in the parms statement.
The problem had something to do with the negative values that come from the normal component of the mixture. The estimation will run if I do not call the pdf('lognormal'...) function for negative data points and just replace this with zero (actually 1E-300) to avoid problems with the log function. I'm still not sure why the original code doesn't work, but this work-around should be fine.
*Estimate the model;
proc nlmixed data=simdata fd=central technique=congra update=pb hessian;
parms a1 = -3.91, pi0=0.8, sigma0=0.005, sigma1=0.1245;
bounds 0 <= pi0 <= 1, 0 < sigma0, 0 < sigma1;
temp1= pi0*pdf('normal',x, 0, sqrt(sigma0**2));
temp2 = 0;
if x > 0 then temp2=(1-pi0)*pdf('lognormal',x, a1, sqrt(sigma1**2));
temp = temp1 + temp2;
if temp = 0 then temp = 1E-300;
loglik = log(temp);
pi1 = 1-pi0;
ln_mean = exp(a1+0.5*sigma1**2);
ln_sd = sqrt((exp(sigma1**2) - 1)*exp(2*a1 + sigma1**2));
model x~general(loglik);
estimate 'alpha1' a1;
estimate 'pi0' pi0;
estimate 'sigma0' sigma0;
estimate 'sigma1' sigma1;
estimate 'pi1' pi1;
estimate 'ln_mean' ln_mean;
estimate 'ln_sd' ln_sd;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.