Hi.
I have some interest in binning, and might have some suggestions for you, except I don't understand your question.
I think of binning as splitting the entire range of values of a variable into a finite set of disjoint partitions. For continuous-valued variables, these partitions are usually subintervals. For example, suppose my original variable takes values from 0 to 100. Then, I might have the following four partitions / bins / sub-intervals:
[0, 21.7), [21.7, 38.4), [38.4, 67.9), [67.9, 100].
When you say "bands" are you referring to what I call bins or partitions? The values in any two different bins are completely disjoint, so what do you mean by having "the distributions within bands to be similar across the sample?"
Do you want the histograms of the values within each bin to have the same shape? Unless you have a uniform distribution a priori, I think that would be almost impossible to achieve.
Or perhaps you're saying that when you partition your data into training, test, and validation sets, you want the distribution of values in each bin to look similar in each of the training, test, and validation partitions? If you have a sufficient amount of data, uniform random sampling without replacement should accomplish that.
You want a small number of bins with low variation? Does this mean you want the sums of squares around the means within each bin to be small? I could show you how to achieve this goal through integer programming, but you'd have to establish a minimum bin size or maximum number of bins, otherwise every point would be its own bin, and you'd have zero variance. But I have to say I don't understand why you'd need to do this. If this is the kind of thing you want, would you care to explain why?
I think the two best reasons to use binning are the following:
1. The bins have some meaning in the context of the analysis. For example, suppose you're looking at a group of students from ages 5 to 18. You might want to bin them as 5 - 11, 12 - 14, 15 - 18 to encapsulate elementary, junior high, and high school ages.
2. The bins help you extract predictive information. Suppose you're trying to predict a binary outcome. If the likelihood of success increases or decreases monotonically as the variable values increase, then there's no reason to bin. But if there are alternating pockets of higher and lower likelihood, binning can help focus the predictive power of the variable.
Anyway, that's my opinion. But some more explanation from you may help us give you useful suggestions.
Good luck!
-- TMK --
T O P K A T Z at M S N dot C O M