BookmarkSubscribeRSS Feed
chemicalab
Fluorite | Level 6

Hi all,

I am trying to find an alternative way of checking the diversity in values of a variable. My goal is to set a measurement or weight that will indicate how well the variable is differentiated in its values and isn't characterized of lets 70% or 80 % of the same value. (another example would that variable X has 503 distinct values in 2000obs which i guess is good)

My goal is to select based on that measure variables for segmentation modeling , cause i believe they can discriminate my data well.

I am looking for something besides Proc univariate, means for stats / Varclus or PCA for variable selection, any idea?

Thank you in advance

15 REPLIES 15
Reeza
Super User

NLEVELS option in proc freq will tell you how many distinct values per variable. 

It sounds like you're looking for a method more than a proc to me, at first glance, ie a uniqueness measure.

Not having any science behind this, I'd consider looking at percent of unique values, ie 503/2000 is about 25% uniqueness, assuming equal distribution which is unlikely.

I have to do this in a few weeks for something I'm working on, so if you find something else that works please post back!

chemicalab
Fluorite | Level 6

Hi Reeza,

I have used proc freq with nlevels and ok it gives me an indication which at this point is i would say ok.

I agree this is not about procs maybe something that could be set up by coding.

I will keep you updated on the matter via this post, my thought was something like entropy weights which give an indication of diversity within a variable but maybe i am assuming that for wrong types of variables.

stat_sas
Ammonite | Level 13

For segmentation you need variables which have more variability as well as uncorrelated. Otherwise solution will not converge.

Reeza
Super User

eMiner does some of this automatically as do a lot of the auto datamining software. My plan was to look into their classification methods and decide how I wanted to do mine Smiley Happy

chemicalab
Fluorite | Level 6

Sounds like a plan, but i will try something in coding, i dont trust Eminer so much

Reeza
Super User

Whats not to trust about eMiner?

It's not really a black box tool and definitely requires user experience in both the tool and statistical methods.

chemicalab
Fluorite | Level 6

I have noticed many bugs and wrongs in Eminer computations, its mostly good to use when the analytical record is set and ready after coding, kinda to use it for predictive modeling (and model comparison) or segmentation, time efficiency is what it offers mainly

Reeza
Super User

Can you expand on the bugs/wrongs? I'm getting ready to use eMiner for a large, important project and am highly interested if there's a reason I shouldn't be.

chemicalab
Fluorite | Level 6

Depends, what type of project is it?

Reeza
Super User

Fraud detection is the general purpose.

M_Maldonado
Barite | Level 11

Hi Chemicalab,

If you think there are bugs in your system, talk to your SAS Admin or to Tech Support.
Make sure that the hot fixes you need have been applied. Feel free to google about your EM version, e.g. google "SAS Enterprise Miner 12.1 Hot Fix" and see what is there.
We find the bugs first than our customers for the most part Smiley Happy.

Good luck!
-Miguel

jwexler
SAS Employee

Wow, great post everyone, very lively today!  With respect to any bugs or issues with Enterprise Miner, SAS Tech Support is a great resource for troubleshooting.  With respect to coding inside EM, there are many options to customize your flows, including the Code Node, Transformations node, etc...  As Reeza stated earlier, some of the finer features may require training and experience.  There are many, many macros and macro variables available to add to your coding experience.

Thanks,

Jonathan

Product Manager - SAS Enterprise Miner

chemicalab
Fluorite | Level 6

Will make sure to do that, regarding the initial question of this post, Reeza i will get back to you on the diversity measure i think i am on to something but requires some coding, it will be working towards the entropy weights i mentioned earlier, will let you know on it

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 15 replies
  • 2405 views
  • 1 like
  • 6 in conversation