## Variable distribution and diversity measure for selection

Frequent Contributor
Posts: 126

# Variable distribution and diversity measure for selection

Hi all,

I am trying to find an alternative way of checking the diversity in values of a variable. My goal is to set a measurement or weight that will indicate how well the variable is differentiated in its values and isn't characterized of lets 70% or 80 % of the same value. (another example would that variable X has 503 distinct values in 2000obs which i guess is good)

My goal is to select based on that measure variables for segmentation modeling , cause i believe they can discriminate my data well.

I am looking for something besides Proc univariate, means for stats / Varclus or PCA for variable selection, any idea?

Thank you in advance

Super User
Posts: 23,771

## Re: Variable distribution and diversity measure for selection

Posted in reply to chemicalab

NLEVELS option in proc freq will tell you how many distinct values per variable.

It sounds like you're looking for a method more than a proc to me, at first glance, ie a uniqueness measure.

Not having any science behind this, I'd consider looking at percent of unique values, ie 503/2000 is about 25% uniqueness, assuming equal distribution which is unlikely.

I have to do this in a few weeks for something I'm working on, so if you find something else that works please post back!

Frequent Contributor
Posts: 126

## Re: Variable distribution and diversity measure for selection

Hi Reeza,

I have used proc freq with nlevels and ok it gives me an indication which at this point is i would say ok.

I agree this is not about procs maybe something that could be set up by coding.

I will keep you updated on the matter via this post, my thought was something like entropy weights which give an indication of diversity within a variable but maybe i am assuming that for wrong types of variables.

Posts: 1,270

## Re: Variable distribution and diversity measure for selection

Posted in reply to chemicalab

For segmentation you need variables which have more variability as well as uncorrelated. Otherwise solution will not converge.

Frequent Contributor
Posts: 126

?????

Super User
Posts: 23,771

## Re: Variable distribution and diversity measure for selection

Posted in reply to chemicalab

eMiner does some of this automatically as do a lot of the auto datamining software. My plan was to look into their classification methods and decide how I wanted to do mine

Frequent Contributor
Posts: 126

## Re: Variable distribution and diversity measure for selection

Sounds like a plan, but i will try something in coding, i dont trust Eminer so much

Super User
Posts: 23,771

## Re: Variable distribution and diversity measure for selection

Posted in reply to chemicalab

Whats not to trust about eMiner?

It's not really a black box tool and definitely requires user experience in both the tool and statistical methods.

Frequent Contributor
Posts: 126

## Re: Variable distribution and diversity measure for selection

I have noticed many bugs and wrongs in Eminer computations, its mostly good to use when the analytical record is set and ready after coding, kinda to use it for predictive modeling (and model comparison) or segmentation, time efficiency is what it offers mainly

Super User
Posts: 23,771

## Re: Variable distribution and diversity measure for selection

Posted in reply to chemicalab

Can you expand on the bugs/wrongs? I'm getting ready to use eMiner for a large, important project and am highly interested if there's a reason I shouldn't be.

Frequent Contributor
Posts: 126

## Re: Variable distribution and diversity measure for selection

Depends, what type of project is it?

Super User
Posts: 23,771

## Re: Variable distribution and diversity measure for selection

Posted in reply to chemicalab

Fraud detection is the general purpose.

Super Contributor
Posts: 338

## Re: Variable distribution and diversity measure for selection

Posted in reply to chemicalab

Hi Chemicalab,

If you think there are bugs in your system, talk to your SAS Admin or to Tech Support.
Make sure that the hot fixes you need have been applied. Feel free to google about your EM version, e.g. google "SAS Enterprise Miner 12.1 Hot Fix" and see what is there.
We find the bugs first than our customers for the most part .

Good luck!
-Miguel

SAS Employee
Posts: 69

## Re: Variable distribution and diversity measure for selection

Posted in reply to chemicalab

Wow, great post everyone, very lively today!  With respect to any bugs or issues with Enterprise Miner, SAS Tech Support is a great resource for troubleshooting.  With respect to coding inside EM, there are many options to customize your flows, including the Code Node, Transformations node, etc...  As Reeza stated earlier, some of the finer features may require training and experience.  There are many, many macros and macro variables available to add to your coding experience.

Thanks,

Jonathan

Product Manager - SAS Enterprise Miner

Frequent Contributor
Posts: 126

## Re: Variable distribution and diversity measure for selection

Posted in reply to chemicalab

Will make sure to do that, regarding the initial question of this post, Reeza i will get back to you on the diversity measure i think i am on to something but requires some coding, it will be working towards the entropy weights i mentioned earlier, will let you know on it

Discussion stats
• 15 replies
• 520 views
• 1 like
• 6 in conversation