About juliema

ballardw · ‎02-06-2015

Here is a brief example using a chi-square test to demonstrate one possible approach. This data set has 3 scenarios, variable TestGroup, that has distributions for your count groups A,B and C (variable Bin) for two subpopulations (variable SampGr). In each case SampGr=2 represents a hypothetical "even distribution" of counts in the bins of roughly one-third in each of 3 bins and the SampGr=1 is what you actually observe. The Rate variable represents the count. data test; input TestGroup SampGr Bin $ rate; datalines; 1 1 A 20 1 1 B 50 1 1 C 30 1 2 A 33 1 2 B 33 1 2 C 33 2 1 A 10 2 1 B 10 2 1 C 80 2 2 A 33 2 2 B 33 2 2 C 33 3 1 A 25 3 1 B 35 3 1 C 40 3 2 A 33 3 2 B 33 3 2 C 33 ; run; proc freq data=test; by testgroup; tables bin*Sampgr /chisq ; weight rate; run; Look at the output for each By group and look at the Statistics. The chi-square test here is basically a measure of similarity. The lower the p-value the less likely the data is similarly distributed. You could use the p-value for the chi-square or the other coefficients as a "metric". The first testgroup looks very likely to not be similar (p-value=0.0332), ie not evenly distributed in SampGr 1, the second testgroup is almost definitely not similar (p-value <0.0001) and third is somewhat smooth (p-value = .4008). Perfect agreement would result in a p-value of 1. There is a reason I used Rate for the weight value. You could easily standardize data by using the percentages from your raw data.

Online Status	Offline
Date Last Visited	‎09-01-2015 07:11 AM

Re: Appropriate metrics to characterize group size distribution?

Appropriate metrics to characterize group size distribution?

Re: Appropriate metrics to characterize group size distribution?