06-14-2012 10:36 PM
Hi, I'm running a proc reg with a weight and needing U95 and L95 from that, actually using freq instead of weight, but never the less the issue I'm getting is if I'm using logged data in the model statement I get an equal range between U95 and L95 between observations, but when unlogged it's not equal.. If I was to use a norml (not-logged) in the model statement, I get an equal range from U95 to L95 between observations..
proc reg data = test;
model DepPrice = Price1;
diff between U95 and L95 for all observations is 300 (some are 301 and 299, but very very close to eachother)
if I had
proc data = test;
model Logged_DePPrice = Logged_Price1;
diff between U95 and L95 for all observations is 5.71, but when I unlog U95 and L95 and then find the difference between the 2 I get values such as 100, 234, etc.. various..which I'm assuming makes more sense. ..
Any experience with this? what makes most sense.
06-15-2012 03:04 AM
Why do you make a log value for response varaible and explain variable ? Because They are not conformed with Normal data ? I guess PRICE are always positive ?
Firstly You should check the residual item of your model , to see how bad it is . Then decide whether to use LOG or not .
06-15-2012 08:46 AM
That is expected behavior. When you do a regression model with the dependent variable being logged, the anti-log is actually an estimate of the median and the confidence limits are asymmetrical. A method proposed by Duan in 1983, called "smearing," was developed to address that issue. More recently, a number of improved methods have been developed to address the skewness of cost data. This article has a good list of papers on the topic:
06-15-2012 12:31 PM
CI don't always have to be equal and it makes sense that the something that is equal on the log scale won't be equal on the normal scale.
Consider a situation where the estimate is 9/10 ie 0.9, a confidence interval for this number probably maxes out at 1 for the upper limit but the lower limit shouldn't be at most 0.8.
I think KSharp's suggestion makes the most sense, tranform the variable if its appropriate, ie normalize, but remember that the assumption for regression is normal errors, not necessarily normal variables.