OK, the example only involves 3 card types, so it doesn't show any clustering at all. Now, imagine there are three card types with 10 cards of each type. I expanded the example with such cards (identified by variable cardId):
/* Fake random data */
data test;
call streaminit(879767);
do acc = 1 to 30000;
cardType = rand("table", 0.4, 0.4);
cardId = ((cardType - 1) * 10) + rand("integer", 10);
app = rand("table", 0.4);
tenure = rand("poisson", 10);
calls = rand("poisson", 3);
if cardType = 1 then do;
transac = rand("poisson", 10);
spend = rand("lognormal", log(100));
end;
else if cardType = 2 then do;
transac = rand("poisson", 20);
spend = rand("lognormal", log(1000));
end;
else do;
transac = rand("poisson", 50);
spend = rand("lognormal", log(2000));
end;
output;
end;
run;
/* Use formats to define categories */
proc format;
value tenure
0-12 = "new card"
13-36 = "mid time card"
37-high = "long time card";
value calls
0-2 = "few calls"
3-5 = "mid calls"
6-high = "lots of calls";
value transac
0-9 = "few transact"
10-49 = "mid transact"
50-high = "many transact";
value spend
0-100 = "low spend"
100-1000 = "mid spend"
1000-high = "high spend";
value app
1 = "App"
2 = "No App";
value cardType
1 = "major"
2 = "store"
3 = "prepaid";
value cardId
1 = "Major 01"
2 = "Major 02"
3 = "Major 03"
4 = "Major 04"
5 = "Major 05"
6 = "Major 06"
7 = "Major 07"
8 = "Major 08"
9 = "Major 09"
10 = "Major 10"
11 = "Store 11"
12 = "Store 12"
13 = "Store 13"
14 = "Store 14"
15 = "Store 15"
16 = "Store 16"
17 = "Store 17"
18 = "Store 18"
19 = "Store 19"
20 = "Store 20"
21 = "Prepaid 21"
22 = "Prepaid 22"
23 = "Prepaid 23"
24 = "Prepaid 24"
25 = "Prepaid 25"
26 = "Prepaid 26"
27 = "Prepaid 27"
28 = "Prepaid 28"
29 = "Prepaid 29"
30 = "Prepaid 30";
run;
/* Perform simple correspondence analysis */
proc corresp data=test;
format cardId cardId. app app. tenure tenure. calls calls. transac transac. spend spend.;
tables cardId, app tenure calls transac spend;
run;
Now, I guess you can see how the clustering of cards by card type is represented on the graph and how the angular (from the origin) proximity of explanatory categories shows their relationship with clusters.
Correspondence analysis is not yet very popular in the USA, but it has been in France, Japan, and elsewhere in the world under different names for a long time, especially in marketing research.
https://documentation.sas.com/?docsetId=statug&docsetTarget=statug_corresp_overview01.htm&docsetVersion=15.2&locale=en
... View more