04-30-2010 05:00 AM

I have some sample data (incomes) that I want to report by state and some other variables. Since they are incomes, I assumed that medians would be the best measure of central tendency to use. However, some of the cells are really small (2-3 obs) and the medians look silly.

Here is some example data from one cell (not real data):

State/income/weight

1........150......600

1.........800.....450

Note. There are also quite a few incomes that are zero, some negative values, and many very high values, however these were top-coded before I got the data (as were the negative values) --so no real outliers left on the high end.

I don't want to be swapping around between means and medians for different cells.

So, I have two questions:

1/ How does SAS calculate the weighted median?

For this example, I assume SAAS would treat these values as though there are 600 instances of 150 and 450 instances of 800, so the median would be 150. In this case, the mean give a better estimate of the CT.

2/ Are there some rules of thumb or best pratice about when to use mean vs median? (Links welcome.) And since the incomes are top-coded, would the mean be OK to use (since really high values that would pull the mean up have been re-coded)?

Thanks for any insights

