turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- regression with binned (interval) data

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-03-2017 05:57 PM

I often have to analyze data where the dependent, independent, or both variables were recorded in bins (intervals), when they really ** should** have been recorded as continuous. Most of the time, the intervals are not even equal, and the highest is unbounded.

For example price of an item= $0-20, $21-100, $101-500, $501-5000, $5001 and up.

Say that price is a function of weight of the item, which is measured exactly to the nearest gram. Maybe these are truffles...

What is the best way to salvage this situation, and build a regression model of price as a function of weight? Rick Wicklin has some posts that seem close to this, such as http://blogs.sas.com/content/iml/2013/04/17/quantile-regression-vs-binning.html , but I am unsure.

Looking forward to all suggestions!

Accepted Solutions

Solution

04-19-2017
12:04 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-19-2017 12:03 PM - edited 04-19-2017 12:22 PM

Since the situation described can be viewed as a matter of interval censoring, UCLA's Institute for Digital Research and Education statistics resources suggest using PROC LIFEREG. http://stats.idre.ucla.edu/sas/dae/interval-regression/

The original suggestion can be found on page 145-146 of one of that article's citations: Long, J. S. 1997. *Regression Models for Categorical and Limited Dependent Variables.* Thousand Oaks, CA: Sage Publications.

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-04-2017 03:04 PM

If you are mostly just interested in assessing how weight increases price, you might just consider price as ordinal and fit an ordinal logistic model such as in PROC LOGISTIC. This would avoid having to quantify the spacing of the response levels.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-04-2017 04:28 PM

Thank you. I never think of logistic regression. I won't be able to state something like this, though: "every 1 gram increment in weight is associated with a $26.43 increase in price", as I would if the price data had been recorded exactly and I used linear regression.

Or am I not realising something?

Solution

04-19-2017
12:04 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-19-2017 12:03 PM - edited 04-19-2017 12:22 PM

Since the situation described can be viewed as a matter of interval censoring, UCLA's Institute for Digital Research and Education statistics resources suggest using PROC LIFEREG. http://stats.idre.ucla.edu/sas/dae/interval-regression/

The original suggestion can be found on page 145-146 of one of that article's citations: Long, J. S. 1997. *Regression Models for Categorical and Limited Dependent Variables.* Thousand Oaks, CA: Sage Publications.