10-05-2013 05:20 PM
10-07-2013 01:02 PM
It's not a test, but for continuous variables you might consider looking at the coefficient of variation (SD/mean), so you could at least rank order the scales. It does have some normality assumptions built in, because if the skewness is radically different as well as the variance then you are in trouble. I don't know what you would do with job category, or if you can even look at variability in a variable that is completely nominal in nature.
10-07-2013 03:30 PM
I have 3 socioeconomic variables on different scales (yearly income, job category, and years of education) and I'm interested in determining whether one of them might be less variable in our sample than the others
If we take a step back and forget statistical analysis for a second, I can't even imagine what it means to compare the variability of yearly income to the variability of years of education. I don't see the point. Normally, when you compare two variabilities, it is on the same scale, but data is collected from different units or different time periods. On the other hand, it makes no sense to compare the means of these variables, no one would make the comparison of the mean of annual income (let's say $50,000) to the mean of years of education (let's say 16). So how could it possibly make sense to compare the variance?
Can you explain in laymen's terms first before we get into statistics what it would mean to compare the variability of yearly income to the variability of years of education? I ask because the direction you are going in would then allow the comparison of any two variables whatsoever, e.g. variability of the number of chocolate chips in mint chocolate chip ice cream to the variability of deaths via donkey kick in the Prussian army in the 1880s. In other words, this seems to me to be a non-sensical direction to go in.
10-07-2013 09:01 PM
I agree with . The variance statistic, or any statistic that is expressed with units, will be inadequate to compare the distributions of disparate variables. If I understand your thinking on this, what you would like to compare is how well the variables cover their respective range. I have never done this, but I think you should investigate the concept of entropy. Entropy is a unitless measure of disorder, or negative information. Sampling strategies often seek to maximize disorder (e.g. stratification). Estimation of entropy requires binning your data. If a variable covers n bins, relative entropy is given by E = - Sum(Pi ln(Pi)) / ln(n) where the sum is over i and Pi is the proportion of observations in bin i. In that expression, ln(n) is the maximum entropy, corresponding to the situation where an equal part of the sample (1/n) is in each bin. Note also that when all observations fall in the same bin, the entropy is zero.
I hope this will put you on a fruitful track.
10-08-2013 11:01 AM
With regards to Steve Denham's suggestion of coefficient of variation, and PGStats's suggestion of Entropy, suppose you go ahead and compute either or both of these.
Now you know the coefficient of variation (or entropy) yearly income is 12 and the coefficient of variation (or entropy) of years of education is 4. So what? How does that comparison improve your understanding of these variables? What is the benefit of comparing these quantities for these two variables? Yes, you can do the math ... but ... I don't see how it helps anything.
Perhaps the problem is that we don't know what the next step is in the analysis of these quantities, or what the next step is in the utilization of the results. Perhaps the original poster could enlighten us further.
10-08-2013 11:29 AM
And suppose the variablility estimates are somewhat alike, but the shapes of the distributions are very different. Until we have a good idea of what the ultimate use of the measures is going to be, it's pretty difficult to see what might be useful.