Thursday - last edited Thursday
I am trying to do customer behavior segmentation using Two stage clustering - Fastclus & Wards Linkage Clustering in EG with the following variables:
3 demographic variable(Binary) - Gender(Female/Non-Female), Language(English/French), Ethnicity(Ethnicity1/Ethnicity2).
9 variables that reflect %Sales spent in each of the 9 departments (sum up to 100%.)
15 more Sales and visit pattern variables.
This is for a retail company and one area they particular interested is %spending by department.
But the segment I got seem to be mostly split along the demographic line, the 6 segment is as follow:
Seg 1: 100% French Speaking
Seg 2: 100% Ethnicity1 Female
Seg 3: 100% Ethnicity1 Male
Seg 4: 100% Ethnicity2 Female
Seg 5: 100% Ethnicity2 Male
Seg 6: The rest
I suspect a huge reason this is their department is gender-specific, like menswear, women wears, mens shoes, woman's shoe, cosmetics and etc. And Stores in French speaking area have different assortment.
My question is:
1) Should I remove the demographic variables during when I do the clustering?
2) For dimension reduction, I used varclus instead of factor analysis, will this affect anything?
3) I done Canonical Discriminant Analysis and got this. How do I interrupt this?
Thanks in advance.