Re: proc reg connecting the dots

wkossack · Posted 08-05-2010 11:55 AM

I'm trying to do a quick analysis. I have a line formed by a half dozen points. The general shape is a curve decreasing from left to right with decreasing slope.

I want to estimate where the line crosses a value. However, if I do a regression I want the regression line to pass through each point and the connecting segment to be a straight line.

The kicker is that I have to do this 2000 times. My first thought is to do a proc reg but how do I formulate the regression correctly?

Dale · Posted 08-05-2010 01:58 PM

Why do you specify that you want the regression line to pass through each point and that the line segment between each point should be a straight line. If that is what you really want, then no regression function is needed. But be advised that you are treating the response variable as being measured without error. Is that what you really want to assume? Is it defensible to assume that the response should have no error term?

Also, can it be assumed that the values of the response variable are all positive? If the response is positive and the response is subject to some measurement error, then I would consider fitting a regression model in which the expectation is modeled as

   E[Y | X] = exp(b0 + b1*x)

where b1 is negative. If the residual error variance is not constant but is proportional to E[Y | X]^2, then the proper way to model this would be to model log(Y). With only 6 data points, you might not be able to determine whether the variance is constant or whether the variance is proportional to E[Y | X]^2. However, it is generally the case that the variance increases with the expectation making the log transformation the appropriate model.

You can add some flexibility to the above model by including higher order terms such as x^2. Thus, you could model the expectation function as:

   E[Y | X] = exp(b0 + b1*x + b2*x^2)

With only 6 observations (per subject or whatever unit you need to loop over), I would not go above a quadratic term when expressing the expectation function.

One more question. What do the 2000 sets represent? Might it be reasonable to assume that the regression function should be represented as

   E[Y | X] = exp(b0+g0{i} + (b1+g1{i})*x)

where g0{i} and g1{i} are random effects which represent offsets from the intercept and slope for the i-th subject?

Be sure that a lot more questions could be asked in order to provide a proper solution to your problem. Have you considered consulting with a statistician that you can talk with directly. You would probably get a better result by talking in person with someone who can ask appropriate questions as your full problem unfolds.