I am looking to translate coefficients obtained from a log-linear regression model into a "per unit incremental change" instead of the usual "percent change" interpretation so they can be used to make forecasts using inputs in a meaningful manner.
Concretely, I am seeking to estimate the average annual cost of adding an additional mile transporting student using budgetary data from school districts statewide as the outcome variable and the total student mileage (i.e., total number of transported students times the average distance from home to school) as the predictor:
Total_Trans_Cost = alpha + beta_1(Total_Trans_Miles)
In this linear regression model the coefficient for total student mileage (beta_1) would reflect the cost of adding one more transportation mile, and thus if I modelled last year's school budget data using last year's total mileage, I could make a prediction of this year's costs using this year's estimated total mileage using the coefficient as a multiplier.
My issue is, given the large disparity between total transportation costs in the millions of dollars verses total mileage in the thousands, I have to transform the data so as to yield a robust linear regression with normally distributed studentized residuals. After vetting a series of models, I found that taking the natural log of both the outcome and predictor yielded very satisfactory results from a statistical point of view, but now I do not know how to make a "per unit" cost estimate from the resulting coefficients. As is well known, a log-linear model yields the equivalent of elasticities in the coefficients, which in my case can readily be interpreted as "a one-percent increase in the total transportation miles is associated with a 1.01 * EXP(beta_1) percent change in total transportation costs." The problem is that I do not know how I can use this "one percent change" with new mileage estimates as they are new tallies and not changes to the prior year tally. My ultimate goal is to be able to just multiply by the "per mile cost" to get an estimate of how much it would cost to transport X number of students.
To spell it out more clearly, my log-linear model is ln(TOT_COST) = 6.928 + 0.886 ln(TOT_MI). The 0.886 converts to about a one percent change in total miles is associated with in 0.885 percent change total transportation cost. How do I make this into "X number of total miles results in Y change in total costs."
Thanks in advance for any advice or suggestions!
Peter
Thanks for introducing me to the NLMEANS and NLEST macros. I must confess I do not make routine use of macros or SQL in my programming, I am very much a vanilla SAS user. I followed the links and attempted to implement one or the other NL macro, but I believe I've hit a roadblock. From what I read, these macros rely on output data from ESTIMATE, LSMEANS, etc. statements, but you can only invoke these statements for class variables, not continuous variables such as TOTAL_MILES. So I'm afraid I'm still at a loss as to how I can translate the coefficents in my log-linear regression as desired.
Yeah. You are right.
Here is what I got.
Maybe @StatDave could give you answer.
ln(TOT_COST1) = 6.928 + 0.886 ln(TOT_MI1) ln(TOT_COST0) = 6.928 + 0.886 ln(TOT_MI0) --> ln(TOT_COST1)-ln(TOT_COST0)=0.886*(ln(TOT_MI1)-ln(TOT_MI0)) --> ln(TOT_COST1/TOT_COST0)=0.886*ln(TOT_MI1/TOT_MI0) --> TOT_COST1/TOT_COST0=(TOT_MI1/TOT_MI0)^0.886 --> So when TOT_MI1=TOT_MI0+1 then TOT_COST1/TOT_COST0=((TOT_MI0+1)/TOT_MI0)^0.886=(1+1/TOT_MI0)^0.886
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.