I am looking to translate coefficients obtained from a log-linear regression model into a "per unit incremental change" instead of the usual "percent change" interpretation so they can be used to make forecasts using inputs in a meaningful manner.
Concretely, I am seeking to estimate the average annual cost of adding an additional mile transporting student using budgetary data from school districts statewide as the outcome variable and the total student mileage (i.e., total number of transported students times the average distance from home to school) as the predictor:
Total_Trans_Cost = alpha + beta_1(Total_Trans_Miles)
In this linear regression model the coefficient for total student mileage (beta_1) would reflect the cost of adding one more transportation mile, and thus if I modelled last year's school budget data using last year's total mileage, I could make a prediction of this year's costs using this year's estimated total mileage using the coefficient as a multiplier.
My issue is, given the large disparity between total transportation costs in the millions of dollars verses total mileage in the thousands, I have to transform the data so as to yield a robust linear regression with normally distributed studentized residuals. After vetting a series of models, I found that taking the natural log of both the outcome and predictor yielded very satisfactory results from a statistical point of view, but now I do not know how to make a "per unit" cost estimate from the resulting coefficients. As is well known, a log-linear model yields the equivalent of elasticities in the coefficients, which in my case can readily be interpreted as "a one-percent increase in the total transportation miles is associated with a 1.01 * EXP(beta_1) percent change in total transportation costs." The problem is that I do not know how I can use this "one percent change" with new mileage estimates as they are new tallies and not changes to the prior year tally. My ultimate goal is to be able to just multiply by the "per mile cost" to get an estimate of how much it would cost to transport X number of students.
To spell it out more clearly, my log-linear model is ln(TOT_COST) = 6.928 + 0.886 ln(TOT_MI). The 0.886 converts to about a one percent change in total miles is associated with in 0.885 percent change total transportation cost. How do I make this into "X number of total miles results in Y change in total costs."
Thanks in advance for any advice or suggestions!
Peter
Thanks for introducing me to the NLMEANS and NLEST macros. I must confess I do not make routine use of macros or SQL in my programming, I am very much a vanilla SAS user. I followed the links and attempted to implement one or the other NL macro, but I believe I've hit a roadblock. From what I read, these macros rely on output data from ESTIMATE, LSMEANS, etc. statements, but you can only invoke these statements for class variables, not continuous variables such as TOTAL_MILES. So I'm afraid I'm still at a loss as to how I can translate the coefficents in my log-linear regression as desired.
Yeah. You are right.
Here is what I got.
Maybe @StatDave could give you answer.
ln(TOT_COST1) = 6.928 + 0.886 ln(TOT_MI1) ln(TOT_COST0) = 6.928 + 0.886 ln(TOT_MI0) --> ln(TOT_COST1)-ln(TOT_COST0)=0.886*(ln(TOT_MI1)-ln(TOT_MI0)) --> ln(TOT_COST1/TOT_COST0)=0.886*ln(TOT_MI1/TOT_MI0) --> TOT_COST1/TOT_COST0=(TOT_MI1/TOT_MI0)^0.886 --> So when TOT_MI1=TOT_MI0+1 then TOT_COST1/TOT_COST0=((TOT_MI0+1)/TOT_MI0)^0.886=(1+1/TOT_MI0)^0.886
I follow your algebra, and it resonates with something similar that I played with as I struggled to pursue my "unit change" goal. To set this up, I will reference the explanation from the Cornell Statistical Consulting Unit (https://cscu.cornell.edu/wp-content/uploads/logv.pdf) which I have found to be the most concise yet accurate description of interpreting the coefficients in log transformed regression equations (pardon the screen snip, but I wanted to preserve the Greek letters and superscripts):
What I draw from this is to take unity plus the desired fraction raised to the coefficient of the logged variable to yield the percent increment. As shown above, this is usually calculated and reported as a one percent change in X results in a BETA_X percent change in Y (i.e. unity plus the desired percentage change, thus 1.00+0.01=1.01). So far, so simple. However, I then reasoned that if 1.01 represents a one percent change, then 1.10 must represent a ten percent change, 1.50 a fifty percent change, and 2.00 a one hundred percent change (i.e., doubling)?! You see how, to my eyes, this ties in with your final equation which has (1+1/TOT_MI0)^BETA. If we apply this reasoning to the results of my log-linear equation, we will raise two to BETA_X, thus 2^0.886 ~= 1.848, which translates to an 84.76% increase in the predicted total transportation costs attributable to a doubling of total transportation miles. Now, it is probably too big a stretch to say that total miles transported accounts for ~85% of total transportation costs, and even if this is true (a mighty big if) then it still does not get me any closer to my goal of estimating the marginal costs associated with a given number of total transportation miles for a particular district.
Again, I feel trapped in "percent world" when I want to be in "unit cost world," with no straightforward way to bridge the two realms. Regretfully, I may have to give up my nice, homeostatic log-linear model for one where I can interpret the coefficients as marginal unit-change as opposed to percent change. I will explore non-linear approaches, such as PROC NLIN, in the hopes this will yield unbiased estimates without logarithmic transformations.
Thanks for your efforts to assist me, I really appreciate it!
So it is non-linear transform. You can not say change 1 unit of TOT_MI, TOT_COST will change 0.886 . So it might be : When TOT_MI=1 , change 1 unit of TOT_MI, TOT_COST will change 2. When TOT_MI=2 , change 1 unit of TOT_MI, TOT_COST will change 10.
Or calling @Rick_SAS
> my log-linear model is ln(TOT_COST) = 6.928 + 0.886 ln(TOT_MI). The 0.886 converts to about a one percent change in total miles is associated with in 0.885 percent change total transportation cost. How do I make this into "X number of total miles results in Y change in total costs."
Yes, you can do that, but the log terms require that you report "X number of total miles results in Y change in total costs when X=X0." The nonlinear transformation of X and Y means that there isn't one universal number to report. Instead, the number depends on the value of X at which you want to consider the change. This is in contrast to the percentage method, for which the beta parameter estimate gives the percentage change, as shown in the Cornell notes.
The formula is just calculus, but you need to apply the chain rule because of the log terms.
Start with the regression equation for the predicted response:
log(Y) = beta_0 + beta_1 * log(X)
We seek dY/dX, so take the derivative wrt X of both sides of the equation and apply the chain rule:
1/Y * dY/dX = beta_1 / X
Solve this equation for dY/dX:
dY/dX = beta_1/X * Y, where Y = exp( beta_0 + beta_1*log(X) ) by solving the regression eqn for Y.
The discrete version of this equation enables you to estimate the change in Y (call it dY) when X changes from X0 to x0 + dX:
dY = dX * beta_1/X0 * exp( beta_0 + beta_1*log(X0) )
So, if last year the district busses ran X0 = 1 million miles and you want to know the cost increase for going 1.01 million miles this year, you would set
X0 = 1 million miles,
Y0 = exp( beta_0 + beta_1*log(X0) )
dX = 0.01 million miles
and compute
dY = estimated change in cost = dX * beta_1/X0 * Y0
Thank you, @Rick_SAS, for your detailed response, it is appreciated!
I fear I mislead you with my statement 'How do I make this into "X number of total miles results in Y change in total costs."' I am not looking to calculate the change in the outcome value based on the year-to-year change in the predictor, but rather to make new estimates based on new values of the predictor using the prior year's model. From what I can see in your explanation is that you use the same logic (i.e., using the difference Xo to Xi to model the incremental cost) that I found wanting when trying to interpret the coefficients for the logged variables. That is why I specified in my original post:
"Concretely, I am seeking to estimate the average annual cost of adding an additional mile transporting student using budgetary data from school districts statewide as the outcome variable and the total student mileage (i.e., total number of transported students times the average distance from home to school) as the predictor:
Total_Trans_Cost = alpha + beta_1(Total_Trans_Miles)
In this linear regression model the coefficient for total student mileage (beta_1) would reflect the cost of adding one more transportation mile, and thus if I modelled last year's school budget data using last year's total mileage, I could make a prediction of this year's costs using this year's estimated total mileage using the coefficient as a multiplier."
I then went into a lengthy discourse as to why my preferred log-linear model did not suit my needs given the usual interpretation of the coefficients as percentage change, ending with:
"The problem is that I do not know how I can use this 'one percent change' with new mileage estimates as they are new tallies and not changes to the prior year tally. My ultimate goal is to be able to just multiply by the 'per mile cost' to get an estimate of how much it would cost to transport X number of students."
To reiterate, I am interested in the marginal cost of adding a transported student mile simply as way to calculate a new estimate using the coefficient for student miles based on last year's budgeted amount but plugging in current year's student miles. Implicit in my question is the concept that one can calculate an "average" per student mile cost statewide and use that value to multiply against the new estimated total transportation miles for each school district by this prior year average amount to get a new predicted cost for the current year. From what you outlined (and what @Ksharp also explained) the logged variables do not allow for a "straight" linear translation, but one that is conditional on where you are on the logistic curve. I was hoping that I could just take the natural log of the new mileage amount, multiply by the mileage coefficient, and then exponentiate the product, but I am beginning to see that it is not that easy.
I hasten to add that I am focused on per student transportation cost as a function of mileage (i.e., student count by average distance), and thus am most interested in using the coefficient for b1 to make the new cost estimate, excluding the intercept (i.e., calculating the marginal cost as opposed to the total cost), as including the intercept would incorporate the effect of omitted factors that contributes to the overall cost (e.g., transportation staff compensation and benefits, etc.) which is not my focus at this time.
I hope I am clearer with my request, but if not, please let me know.
Thanks again!
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.