04-04-2013 08:09 AM
Here is the deal :
I have a 15 Million Lines and 500 Variables which makes the huge dataset.
I Want to make a behavioral segmentation.
First, i have to choose the variables that are most significant to have just the essential elements and then proceed by k-means for segmentation.
How can i choose the significant variables?
04-05-2013 08:21 AM
No code, but some ideas.
If you have access to Enterprise Miner, then a lot of other techniques become available, most of which have the word "tree" in their name.
04-08-2013 06:45 AM
Hope you have sorted your problem with methods described above.
Just wondering what types of variables you have and did you also try factor analysis and MODECLUS?
I had same problem with no. of significant variables, so curious to know which technique was most useful.
04-09-2013 08:10 AM
I am going to use SteveDenham idea, it's very logical and seems that it would work.
I am still on some other tasks that take memory as well. I tried it on another laptop and works just fine.
Proc varclus to see how the variables cluster and then from a business perspective i chose the one i judged important from each cluster and some others and then i added other ones even though they didn't show much in the clustering but they are necessary for this exercise.
Hope i won't run into any trouble, in that case i'll be back to bother you guys
good day to ye !