Hi @Rick_SAS ! Thank you so much for responding. I ran your example, but I guess it would help me to see the equations, because when I tried to replicate some of the log-likelihood values by hand, I got very different results. First of all, is this the normal log likelihood formula? -(N * (0.5) * ln(2 * pi)) -(N * ln(sigma)) -((0.5 * sum((x[j] - mu)**2) / (sigma**2)) where x[j] are the data values, N is the number of values, and mu and sigma are the parameters to determine. Then, if mu = (sum(x[j]) / N) and sigma_ml = sqrt(sum((x[j] - mu)**2) / N), when you substitute back into the log likelihood you get: -(N * (0.5) * ln(2 * pi)) -(N * ln(sigma_ml)) -(0.5 * N) In this case, for each lambda, the x[j] values are the lambda transformed values of the original set of x values. I tried the logpdf function in SAS, and got numbers corresponding to the formula I gave above. But that isn't what came out of PROC TRANSREG for your example. For your original set of Y values, I get ll = -17.4666119. But PROC TRANSREG says ll = -3.80403 for lambda = +1. For your Ym2, I got ll = 51.408472367 and for your Y2, I got ll = -40.72112937. As you remark, the BC formula without the geometric mean term puts the data on different scales for different lambdas. But if I use the geometric mean term in the BC formula for your example data, for lambda = -2 I get ll = -17.38343536 and for lambda = +2 I get ll = -17.79049346, which are in the same ballpark. For lambda = -2 PROC TRANSREG says ll = -3.72085 and for lambda = +2, PROC TRANSREG says ll = -4.12791. My code is below. When you look at the log likelihood formula with the maximum likelihood parameter value substitutions, it seems clear that the lambda that minimizes the value of sigma_ml will maximize the log likelihood. So, what am I getting wrong? data handy ;
set have end = last ;
keep n_obs n_fact y_gmn lln
y_mean y_std y_stdn y_ll y_lln
ym_mean ym_std ym_stdn ym_ll ym_lln
yp_mean yp_std yp_stdn yp_ll yp_lln
yl_mean yl_std yl_stdn yl_ll yl_lln
yml_mean yml_std yml_stdn yml_ll yml_lln
ypl_mean ypl_std ypl_stdn ypl_ll ypl_lln
;
array ty{1:10} _temporary_ ;
array tym{1:10} _temporary_ ;
array typ{1:10} _temporary_ ;
array tyl{1:10} _temporary_ ;
array tyml{1:10} _temporary_ ;
array typl{1:10} _temporary_ ;
ty{_N_} = Y ;
tym{_N_} = Ym2 ;
typ{_N_} = Y2 ;
if last then do ;
n_obs = _N_ ;
n_fact = sqrt((n_obs-1)/n_obs) ;
y_gmn = geomean(of ty{*}) ;
lln = n_obs * log(2 * constant("pi")) / 2 ;
do ix = 1 to n_obs ;
tyl{ix} = ty{ix} - 1 ;
tyml{ix} = ((ty{ix})**(-2) - 1) / (-2 * (y_gmn)**(-3)) ;
typl{ix} = ((ty{ix})**(2) - 1) / (2 * (y_gmn)**(+1)) ;
end ;
y_mean = mean(of ty{*}) ;
y_std = std(of ty{*}) ;
y_stdn = y_std * n_fact ;
ym_mean = mean(of tym{*}) ;
ym_std = std(of tym{*}) ;
ym_stdn = ym_std * n_fact ;
yp_mean = mean(of typ{*}) ;
yp_std = std(of typ{*}) ;
yp_stdn = yp_std * n_fact ;
yl_mean = mean(of tyl{*}) ;
yl_std = std(of tyl{*}) ;
yl_stdn = yl_std * n_fact ;
yml_mean = mean(of tyml{*}) ;
yml_std = std(of tyml{*}) ;
yml_stdn = yml_std * n_fact ;
ypl_mean = mean(of typl{*}) ;
ypl_std = std(of typl{*}) ;
ypl_stdn = ypl_std * n_fact ;
y_ll = 0 ;
y_lln = 0 ;
ym_ll = 0 ;
ym_lln = 0 ;
yp_ll = 0 ;
yp_lln = 0 ;
yl_ll = 0 ;
yl_lln = 0 ;
yml_ll = 0 ;
yml_lln = 0 ;
ypl_ll = 0 ;
ypl_lln = 0 ;
do ix = 1 to n_obs ;
y_ll + logpdf('NORMAL',ty{ix},y_mean,y_std) ;
y_lln + logpdf('NORMAL',ty{ix},y_mean,y_stdn) ;
ym_ll + logpdf('NORMAL',tym{ix},ym_mean,ym_std) ;
ym_lln + logpdf('NORMAL',tym{ix},ym_mean,ym_stdn) ;
yp_ll + logpdf('NORMAL',typ{ix},yp_mean,yp_std) ;
yp_lln + logpdf('NORMAL',typ{ix},yp_mean,yp_stdn) ;
yl_ll + logpdf('NORMAL',tyl{ix},yl_mean,yl_std) ;
yl_lln + logpdf('NORMAL',tyl{ix},yl_mean,yl_stdn) ;
yml_ll + logpdf('NORMAL',tyml{ix},yml_mean,yml_std) ;
yml_lln + logpdf('NORMAL',tyml{ix},yml_mean,yml_stdn) ;
ypl_ll + logpdf('NORMAL',typl{ix},ypl_mean,ypl_std) ;
ypl_lln + logpdf('NORMAL',typl{ix},ypl_mean,ypl_stdn) ;
end ;
yl_ll_h = -lln -(0.5*(n_obs-1)) - (n_obs*log(yl_std)) ;
yl_lln_h = -lln -(0.5*n_obs) - (n_obs*log(yl_stdn)) ;
yml_ll_h = -lln -(0.5*(n_obs-1)) - (n_obs*log(yml_std)) ;
yml_lln_h = -lln -(0.5*n_obs) - (n_obs*log(yml_stdn)) ;
ypl_ll_h = -lln -(0.5*(n_obs-1)) - (n_obs*log(ypl_std)) ;
ypl_lln_h = -lln -(0.5*n_obs) - (n_obs*log(ypl_stdn)) ;
put n_obs= n_fact= y_gmn= lln= ;
put y_mean= y_std= y_stdn= y_ll= y_lln= ;
put ym_mean= ym_std= ym_stdn= ym_ll= ym_lln= ;
put yp_mean= yp_std= yp_stdn= yp_ll= yp_lln= ;
put yl_mean= yl_std= yl_stdn= yl_ll= yl_lln= ;
put yml_mean= yml_std= yml_stdn= yml_ll= yml_lln= ;
put ypl_mean= ypl_std= ypl_stdn= ypl_ll= ypl_lln= ;
put yl_ll_h= yl_lln_h= ;
put yml_ll_h= yml_lln_h= ;
put ypl_ll_h= ypl_lln_h= ;
output ;
end ;
run ;
title2 "proc print data = &syslast." ;
proc print data = &syslast. ;
run ;
title2 ;
... View more