06-02-2015 12:32 PM
... and JMP, while we are at it.
I have written several papers on Cardinality Ratio and have an sco wiki page on it.
* the cardinality of a set is the number of elements in the set
* cardinality of a data set is n-obs
* cardinality of a variable is n-levels
* cardinality ratio is n-levels / n-obs;
* CR is a reducing function, its range is in (0:1]
and it is easier to compare variable's CR than the range of n-levels in (1:n-obs)
if one takes the time to calculate the n-levels of each variable
proc freq data = sashelp.class nlevels;
I know you have the n-obs of the data set,
so, in my book it is a simple step to calculate
cardinality_ratio = n-levels / n-obs;
Why is this a Good Idea?
Because it takes several steps to calculate it.
proc contents ... out = out_contents;
proc freq ... out= out_freq;
proc sort out_contents
proc sort out_freq
merge out_contents out_freq;
cardinality_ratio = nlevels / nobs;
proc summary var cardinality_ratio out = out_means
when ... cr_type = 'unique';
when ... cr_type = 'many';
otherwise cr_type = 'few';
Having proc freq do the number-crunching makes calculating the
cr-type in (continuous, discrete, unique)