turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- proc reg connecting the dots

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

08-05-2010 11:55 AM

I'm trying to do a quick analysis. I have a line formed by a half dozen points. The general shape is a curve decreasing from left to right with decreasing slope.

I want to estimate where the line crosses a value. However, if I do a regression I want the regression line to pass through each point and the connecting segment to be a straight line.

The kicker is that I have to do this 2000 times. My first thought is to do a proc reg but how do I formulate the regression correctly?

I want to estimate where the line crosses a value. However, if I do a regression I want the regression line to pass through each point and the connecting segment to be a straight line.

The kicker is that I have to do this 2000 times. My first thought is to do a proc reg but how do I formulate the regression correctly?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to wkossack

08-05-2010 01:58 PM

Why do you specify that you want the regression line to pass through each point and that the line segment between each point should be a straight line. If that is what you really want, then no regression function is needed. But be advised that you are treating the response variable as being measured without error. Is that what you really want to assume? Is it defensible to assume that the response should have no error term?

Also, can it be assumed that the values of the response variable are all positive? If the response is positive and the response is subject to some measurement error, then I would consider fitting a regression model in which the expectation is modeled as

E[Y | X] = exp(b0 + b1*x)

where b1 is negative. If the residual error variance is not constant but is proportional to E[Y | X]^2, then the proper way to model this would be to model log(Y). With only 6 data points, you might not be able to determine whether the variance is constant or whether the variance is proportional to E[Y | X]^2. However, it is generally the case that the variance increases with the expectation making the log transformation the appropriate model.

You can add some flexibility to the above model by including higher order terms such as x^2. Thus, you could model the expectation function as:

E[Y | X] = exp(b0 + b1*x + b2*x^2)

With only 6 observations (per subject or whatever unit you need to loop over), I would not go above a quadratic term when expressing the expectation function.

One more question. What do the 2000 sets represent? Might it be reasonable to assume that the regression function should be represented as

E[Y | X] = exp(b0+g0{i} + (b1+g1{i})*x)

where g0{i} and g1{i} are random effects which represent offsets from the intercept and slope for the i-th subject?

Be sure that a lot more questions could be asked in order to provide a proper solution to your problem. Have you considered consulting with a statistician that you can talk with directly. You would probably get a better result by talking in person with someone who can ask appropriate questions as your full problem unfolds.

Also, can it be assumed that the values of the response variable are all positive? If the response is positive and the response is subject to some measurement error, then I would consider fitting a regression model in which the expectation is modeled as

E[Y | X] = exp(b0 + b1*x)

where b1 is negative. If the residual error variance is not constant but is proportional to E[Y | X]^2, then the proper way to model this would be to model log(Y). With only 6 data points, you might not be able to determine whether the variance is constant or whether the variance is proportional to E[Y | X]^2. However, it is generally the case that the variance increases with the expectation making the log transformation the appropriate model.

You can add some flexibility to the above model by including higher order terms such as x^2. Thus, you could model the expectation function as:

E[Y | X] = exp(b0 + b1*x + b2*x^2)

With only 6 observations (per subject or whatever unit you need to loop over), I would not go above a quadratic term when expressing the expectation function.

One more question. What do the 2000 sets represent? Might it be reasonable to assume that the regression function should be represented as

E[Y | X] = exp(b0+g0{i} + (b1+g1{i})*x)

where g0{i} and g1{i} are random effects which represent offsets from the intercept and slope for the i-th subject?

Be sure that a lot more questions could be asked in order to provide a proper solution to your problem. Have you considered consulting with a statistician that you can talk with directly. You would probably get a better result by talking in person with someone who can ask appropriate questions as your full problem unfolds.