BookmarkSubscribeRSS Feed
podarum
Quartz | Level 8

Hi, I'm running a proc reg with a weight and needing U95 and L95 from that, actually using freq instead of weight, but never the less the issue I'm getting is if I'm using logged data in the model statement I get an equal range between U95 and L95 between observations, but when unlogged it's not equal.. If I was to use a norml (not-logged) in the model statement, I get an equal range from U95 to L95 between observations..

proc reg data = test;

model DepPrice = Price1;

freq counts;

run;

diff between U95 and L95 for all observations is 300 (some are 301 and 299, but very very close to eachother)

if I had

proc data = test;

model Logged_DePPrice = Logged_Price1;

freq counts;

run;

diff between U95 and L95 for all observations is 5.71, but when I unlog U95 and L95 and then find the difference between the 2 I get values such as 100, 234, etc.. various..which I'm assuming makes more sense. ..

Any experience with this?  what makes most sense.

Thanks

4 REPLIES 4
Ksharp
Super User

Why do you make a log value for response varaible and explain variable ? Because They are not conformed with Normal data ? I guess PRICE are always positive ?

Firstly You should check the residual item of your model , to see how bad it is . Then decide whether to use LOG or not .

Ksharp

Doc_Duke
Rhodochrosite | Level 12

Podarium,

That is expected behavior.  When you do a regression model with the dependent variable being logged, the anti-log is actually an estimate of the median and the confidence limits are asymmetrical.  A method proposed by Duan in 1983, called "smearing," was developed to address that issue.  More recently, a number of improved methods have been developed to address the skewness of cost data.  This article has a good list of papers on the topic:

Estimation of the retransformed conditional mean in health care cost studies

Doc Muhlbaier

Duke

podarum
Quartz | Level 8

Thanks Doc.. I'll check it out..

Reeza
Super User

CI don't always have to be equal and it makes sense that the something that is equal on the log scale won't be equal on the normal scale.

Consider a situation where the estimate is 9/10 ie 0.9, a confidence interval for this number probably maxes out at 1 for the upper limit but the lower limit shouldn't be at most 0.8.

I think KSharp's suggestion makes the most sense, tranform the variable if its appropriate, ie normalize, but remember that the assumption for regression is normal errors, not necessarily normal variables.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1018 views
  • 3 likes
  • 4 in conversation