turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- SAS Procedures
- /
- Means vs medians for weighted data with some small...

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

04-30-2010 05:00 AM

I have some sample data (incomes) that I want to report by state and some other variables. Since they are incomes, I assumed that medians would be the best measure of central tendency to use. However, some of the cells are really small (2-3 obs) and the medians look silly.

Here is some example data from one cell (not real data):

State/income/weight

1........150......600

1.........800.....450

Note. There are also quite a few incomes that are zero, some negative values, and many very high values, however these were top-coded before I got the data (as were the negative values) --so no real outliers left on the high end.

I don't want to be swapping around between means and medians for different cells.

So, I have two questions:

1/ How does SAS calculate the weighted median?

For this example, I assume SAAS would treat these values as though there are 600 instances of 150 and 450 instances of 800, so the median would be 150. In this case, the mean give a better estimate of the CT.

2/ Are there some rules of thumb or best pratice about when to use mean vs median? (Links welcome.) And since the incomes are top-coded, would the mean be OK to use (since really high values that would pull the mean up have been re-coded)?

Thanks for any insights

Here is some example data from one cell (not real data):

State/income/weight

1........150......600

1.........800.....450

Note. There are also quite a few incomes that are zero, some negative values, and many very high values, however these were top-coded before I got the data (as were the negative values) --so no real outliers left on the high end.

I don't want to be swapping around between means and medians for different cells.

So, I have two questions:

1/ How does SAS calculate the weighted median?

For this example, I assume SAAS would treat these values as though there are 600 instances of 150 and 450 instances of 800, so the median would be 150. In this case, the mean give a better estimate of the CT.

2/ Are there some rules of thumb or best pratice about when to use mean vs median? (Links welcome.) And since the incomes are top-coded, would the mean be OK to use (since really high values that would pull the mean up have been re-coded)?

Thanks for any insights