BookmarkSubscribeRSS Feed
chevy37
Calcite | Level 5

Hello,

I have 3 socioeconomic variables on different scales (yearly income, job category, and years of education) and I'm interested in determining whether one of them might be less variable in our sample than the others. Namely, we suspect that our sample might be more limited in their range of education than they are in income or job type. Is there a statistical test or other statistic I can examine that would allow me to compare the variability across these three variables? Obviously, standardizing them isn't helpful for comparisons of variability. I can likely find the syntax if I know the concept. 

Thanks so much for any help!


5 REPLIES 5
SteveDenham
Jade | Level 19

It's not a test, but for continuous variables you might consider looking at the coefficient of variation (SD/mean), so you could at least rank order the scales.  It does have some normality assumptions built in, because if the skewness is radically different as well as the variance then you are in trouble.  I don't know what you would do with job category, or if you can even look at variability in a variable that is completely nominal in nature.

Steve Denham

PaigeMiller
Diamond | Level 26
I have 3 socioeconomic variables on different scales (yearly income, job category, and years of education) and I'm interested in determining whether one of them might be less variable in our sample than the others

If we take a step back and forget statistical analysis for a second, I can't even imagine what it means to compare the variability of yearly income to the variability of years of education. I don't see the point. Normally, when you compare two variabilities, it is on the same scale, but data is collected from different units or different time periods. On the other hand, it makes no sense to compare the means of these variables, no one would make the comparison of the mean of annual income (let's say $50,000) to the mean of years of education (let's say 16). So how could it possibly make sense to compare the variance?

Can you explain in laymen's terms first before we get into statistics what it would mean to compare the variability of yearly income to the variability of years of education? I ask because the direction you are going in would then allow the comparison of any two variables whatsoever, e.g. variability of the number of chocolate chips in mint chocolate chip ice cream to the variability of deaths via donkey kick in the Prussian army in the 1880s. In other words, this seems to me to be a non-sensical direction to go in.

--
Paige Miller
PGStats
Opal | Level 21

I agree with . The variance statistic, or any statistic that is expressed with units, will be inadequate to compare the distributions of disparate variables. If I understand your thinking on this, what you would like to compare is how well the variables cover their respective range. I have never done this, but I think you should investigate the concept of entropy. Entropy is a unitless measure of disorder, or negative information. Sampling strategies often seek to maximize disorder (e.g. stratification). Estimation of entropy requires binning your data. If a variable covers n bins, relative entropy is given by E = - Sum(Pi ln(Pi)) / ln(n) where the sum is over i and Pi is the proportion of observations in bin i. In that expression, ln(n) is the maximum entropy, corresponding to the situation where an equal part of the sample (1/n) is in each bin. Note also that when all observations fall in the same bin, the entropy is zero.

I hope this will put you on a fruitful track.

PG

PG
PaigeMiller
Diamond | Level 26

With regards to Steve Denham's suggestion of coefficient of variation, and PGStats's suggestion of Entropy, suppose you go ahead and compute either or both of these.

Now you know the coefficient of variation (or entropy) yearly income is 12 and the coefficient of variation (or entropy) of years of education is 4. So what? How does that comparison improve your understanding of these variables? What is the benefit of comparing these quantities for these two variables? Yes, you can do the math ... but ... I don't see how it helps anything.

Perhaps the problem is that we don't know what the next step is in the analysis of these quantities, or what the next step is in the utilization of the results. Perhaps the original poster could enlighten us further.

--
Paige Miller
SteveDenham
Jade | Level 19

And suppose the variablility estimates are somewhat alike, but the shapes of the distributions are very different.  Until we have a good idea of what the ultimate use of the measures is going to be, it's pretty difficult to see what might be useful.

Steve Denham

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 4448 views
  • 0 likes
  • 4 in conversation