Hi Team, Is there a way we can collapse the levels of categorical variables in an automated manner? One way i know is to calculate the % of events falling in each category and run cluster analysis on it. Can this be automated for multiple variables considering only event rate, not cluster analysis? For example, there is a variable called "Char A". I have calculated Event Rate for this variable, i.e. percentage of 1s appearing in dependent variable (let's say VarY). It is simply the mean of Y. Char A Event Rate A 49% B 67% C 2% D 87% E 4% F 3% Next step is to combine categories of C,E and F as they have almost similar event rate (let's say, variation within 5%). Method 2 : If the above methodology is complicated to automate, can we make it simple taking only the number of cases falling in Char A (not considering dependent variable). For example, if a categorical level contains atmost 5% observations, combine it with others which all have percentage less than 5%, Char A ColPct C 2% F 3% E 4% A 49% B 67% D 87% Combine categories of C,E and F as they have column percentage less than 5%. After using either of the 2 methodologies, we need to replace the levels with the combined category in a raw data file. The raw data file looks like below - Char A Y C 1 C 1 F 1 E 0 E 0 A 1 B 1 D 1 E 0 A 0 B 1 D 1 C 1 Any help would be highly appreciated! Thanks in anticipation!
... View more