BookmarkSubscribeRSS Feed
kpcrotty
Calcite | Level 5

I am trying to estimate a mixture model with 2 components, one of which is normal and the other is lognormal, using PROC NLMIXED.  When I simulate data from the model and try to estimate the model, NLMIXED gives an "Execution error for observation 1" error message and stops processing.  I've tried troubleshooting by looking for numerical issues using a data step, but I don't get any errors in the corresponding data step.  Any thoughts on how to approach troubleshooting this would be appreciated.  Thanks.

 

 

*Simulate data;
%let pi0 	= 0.8;			*Mixing probabilities;
%let pi1 	= 1-&pi0;
%let mu0 	= 0;			*Mean of normal component 0;
%let sigma0     = 0.005;		*Standard deviation of normal component 0;
%let m1 	= 0.02;			*The expected value of lognormal component 1;
%let s1 	= 0.0025;		*The standard deviation of lognormal component 1;

%let mu1 	= log((&m1**2)/sqrt(&s1**2 + &m1**2));
%let sigma1     = sqrt(log(1 + (&s1**2)/(&m1**2)));

data simdata;
	call streaminit(123);
	do i = 1 to 1000;
		component = rand('bernoulli',&pi1);
		if component=0 		then x = rand('normal',&mu0,&sigma0);
		else if component=1 then x = exp(rand('normal',&mu1,&sigma1));	
		output;
	end;
	run;

*Estimate the model;
proc nlmixed data=simdata	fd=central technique=congra update=pb hessian;
	parms a1 = -3.91, pi0=0.8, sigma0=0.005, sigma1=0.1245;
	bounds 0 <= pi0 <= 1, 0 < sigma0, 0 < sigma1; 
	temp1= pi0*pdf('normal',x, 0, sqrt(sigma0**2));
	temp2=(1-pi0)*pdf('lognormal',x, a1, sqrt(sigma1**2));
	temp = temp1 + temp2;
	if temp = 0 then temp = 1E-300;
	loglik = log(temp);

	pi1 = 1-pi0;
	ln_mean = exp(a1+0.5*sigma1**2);
	ln_sd   = sqrt((exp(sigma1**2) - 1)*exp(2*a1 + sigma1**2));

	model x~general(loglik);

	estimate 'alpha1' a1;
	estimate 'pi0' pi0;
	estimate 'sigma0' sigma0;
	estimate 'sigma1' sigma1;
	estimate 'pi1' pi1;
	estimate 'ln_mean' ln_mean;
	estimate 'ln_sd' ln_sd;
run;

*Check log-likelihood for potential errors;
data llcheck;
	set simdata;
	a1 = -3.91; pi0=0.8; sigma0=0.005; sigma1=0.1245;
	temp1= pi0*pdf('normal',x, 0, sqrt(sigma0**2));
	temp2=(1-pi0)*pdf('lognormal',x, a1, sqrt(sigma1**2));
	temp = temp1 + temp2;
	if temp = 0 then temp = 1E-300;
	loglik = log(temp);

	pi1 = 1-pi0;
	ln_mean = exp(a1+0.5*sigma1**2);
	ln_sd   = sqrt((exp(sigma1**2) - 1)*exp(2*a1 + sigma1**2));	
	run;

 

2 REPLIES 2
pau13rown
Lapis Lazuli | Level 10

from experience id say it can be very sensitive to the initial estimates in the parms statement. But check some basic stats on temp variable eg max, min

kpcrotty
Calcite | Level 5

Thanks, Paul.  I've had the same experience, but what was strange is that it wouldn't evaluate the functions within NLMIXED but would within a data step, even for the same values passed in the parms statement. 

 

The problem had something to do with the negative values that come from the normal component of the mixture.  The estimation will run if I do not call the pdf('lognormal'...) function for negative data points and just replace this with zero (actually 1E-300) to avoid problems with the log function.  I'm still not sure why the original code doesn't work, but this work-around should be fine.

 

*Estimate the model;
proc nlmixed data=simdata fd=central technique=congra update=pb hessian;
	parms a1 = -3.91, pi0=0.8, sigma0=0.005, sigma1=0.1245;
	bounds 0 <= pi0 <= 1, 0 < sigma0, 0 < sigma1; 
	temp1= pi0*pdf('normal',x, 0, sqrt(sigma0**2));
	temp2 = 0;
	if x > 0 then temp2=(1-pi0)*pdf('lognormal',x, a1, sqrt(sigma1**2));
	temp = temp1 + temp2;
	if temp = 0 then temp = 1E-300;
	loglik = log(temp);

	pi1 = 1-pi0;
	ln_mean = exp(a1+0.5*sigma1**2);
	ln_sd   = sqrt((exp(sigma1**2) - 1)*exp(2*a1 + sigma1**2));

	model x~general(loglik);

	estimate 'alpha1' a1;
	estimate 'pi0' pi0;
	estimate 'sigma0' sigma0;
	estimate 'sigma1' sigma1;
	estimate 'pi1' pi1;
	estimate 'ln_mean' ln_mean;
	estimate 'ln_sd' ln_sd;
run;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1576 views
  • 0 likes
  • 2 in conversation