BookmarkSubscribeRSS Feed
OneEyedKing
Fluorite | Level 6

I am looking to translate coefficients obtained from a log-linear regression model into a "per unit incremental change" instead of the usual "percent change" interpretation so they can be used to make forecasts using inputs in a meaningful manner.

 

Concretely, I am seeking to estimate the average annual cost of adding an additional mile transporting student using budgetary data from school districts statewide as the outcome variable and the total student mileage (i.e., total number of transported students times the average distance from home to school) as the predictor:

 

Total_Trans_Cost = alpha + beta_1(Total_Trans_Miles)

 

In this linear regression model the coefficient for total student mileage (beta_1) would reflect the cost of adding one more transportation mile, and thus if I modelled last year's school budget data using last year's total mileage, I could make a prediction of this year's costs using this year's estimated total mileage using the coefficient as a multiplier.

 

My issue is, given the large disparity between total transportation costs in the millions of dollars verses total mileage in the thousands, I have to transform the data so as to yield a robust linear regression with normally distributed studentized residuals. After vetting a series of models, I found that taking the natural log of both the outcome and predictor yielded very satisfactory results from a statistical point of view, but now I do not know how to make a "per unit" cost estimate from the resulting coefficients. As is well known, a log-linear model yields the equivalent of elasticities in the coefficients, which in my case can readily be interpreted as "a one-percent increase in the total transportation miles is associated with a 1.01 * EXP(beta_1) percent change in total transportation costs." The problem is that I do not know how I can use this "one percent change" with new mileage estimates as they are new tallies and not changes to the prior year tally. My ultimate goal is to be able to just multiply by the "per mile cost" to get an estimate of how much it would cost to transport X number of students.

 

To spell it out more clearly, my log-linear model is ln(TOT_COST) = 6.928 + 0.886 ln(TOT_MI). The 0.886 converts to about a one percent change in total miles is associated with in 0.885 percent change total transportation cost. How do I make this into "X number of total miles results in Y change in total costs."

 

Thanks in advance for any advice or suggestions!

Peter

5 REPLIES 5
OneEyedKing
Fluorite | Level 6

Thanks for introducing me to the NLMEANS and NLEST macros. I must confess I do not make routine use of macros or SQL in my programming, I am very much a vanilla SAS user.  I followed the links and attempted to implement one or the other NL macro, but I believe I've hit a roadblock. From what I read, these macros rely on output data from ESTIMATE, LSMEANS, etc. statements, but you can only invoke these statements for class variables, not continuous variables such as TOTAL_MILES. So I'm afraid I'm still at a loss as to how I can translate the coefficents in my log-linear regression as desired.

Ksharp
Super User

Yeah. You are right.

Here is what I got.

Maybe @StatDave  could give you answer.

 

 

ln(TOT_COST1) = 6.928 + 0.886 ln(TOT_MI1)
ln(TOT_COST0) = 6.928 + 0.886 ln(TOT_MI0)

-->
ln(TOT_COST1)-ln(TOT_COST0)=0.886*(ln(TOT_MI1)-ln(TOT_MI0))
-->
ln(TOT_COST1/TOT_COST0)=0.886*ln(TOT_MI1/TOT_MI0)
-->
TOT_COST1/TOT_COST0=(TOT_MI1/TOT_MI0)^0.886
-->
So when TOT_MI1=TOT_MI0+1 then
TOT_COST1/TOT_COST0=((TOT_MI0+1)/TOT_MI0)^0.886=(1+1/TOT_MI0)^0.886

OneEyedKing
Fluorite | Level 6

I follow your algebra, and it resonates with something similar that I played with as I struggled to pursue my "unit change" goal. To set this up, I will reference the explanation from the Cornell Statistical Consulting Unit (https://cscu.cornell.edu/wp-content/uploads/logv.pdf) which I have found to be the most concise yet accurate description of interpreting the coefficients in log transformed regression equations (pardon the screen snip, but I wanted to preserve the Greek letters and superscripts): 

OneEyedKing_1-1764867485657.png

What I draw from this is to take unity plus the desired fraction raised to the coefficient of the logged variable to yield the percent increment. As shown above, this is usually calculated and reported as a one percent change in X results in a BETA_X percent change in Y (i.e. unity plus the desired percentage change, thus 1.00+0.01=1.01). So far, so simple. However, I then reasoned that if 1.01 represents a one percent change, then 1.10 must represent a ten percent change, 1.50 a fifty percent change, and 2.00 a one hundred percent change (i.e., doubling)?! You see how, to my eyes, this ties in with your final equation which has (1+1/TOT_MI0)^BETA. If we apply this reasoning to the results of my log-linear equation, we will raise two to BETA_X, thus 2^0.886 ~= 1.848, which translates to an 84.76% increase in the predicted total transportation costs attributable to a doubling of total transportation miles. Now, it is probably too big a stretch to say that total miles transported accounts for ~85% of total transportation costs, and even if this is true (a mighty big if) then it still does not get me any closer to my goal of estimating the marginal costs associated with a given number of total transportation miles for a particular district.

 

Again, I feel trapped in "percent world" when I want to be in "unit cost world," with no straightforward way to bridge the two realms. Regretfully, I may have to give up my nice, homeostatic log-linear model for one where I can interpret the coefficients as marginal unit-change as opposed to percent change. I will explore non-linear approaches, such as PROC NLIN, in the hopes this will yield unbiased estimates without logarithmic transformations.

 

Thanks for your efforts to assist me, I really appreciate it!

Ksharp
Super User
So it is non-linear transform.
You can not say change 1 unit of TOT_MI, TOT_COST will change 0.886 .

So it might be :
When TOT_MI=1 , change 1 unit of TOT_MI, TOT_COST will change 2.
When TOT_MI=2 , change 1 unit of TOT_MI, TOT_COST will change 10.

Or calling @Rick_SAS  

sas-innovate-2026-white.png



April 27 – 30 | Gaylord Texan | Grapevine, Texas

Registration is open

Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!

Register now

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 244 views
  • 3 likes
  • 2 in conversation