Contributor
Posts: 59

# Binning a large number of Continuous Variable using Percentiles or other Cut-Points

I have a SAS Programming Problem that you may have already solved:

My Data Set contains three sets of continuous variables:

DQ01 - DQ59  DE01 - DE59  &  DL01 - DL59.

( 177 variables ) Each standardised with Mean = 50 and Variance  = 100

The basic Statistical problem is Binary Logistic Regression.

1. I want to bin each continuous variable using deciles or semi-deciles
that have been computed using PROC Univariate / Summary.

2. Compute and output the Percentiles for each Variable.

3. For each variable compare the observed values with the Percentile
Cut-Points and then allocate that observation to a Decile Bin.

4. Optimise the Bin Allocation based on a metric such as the GINI.

5. Apply a Robust WOE Transformation to each Binned Variable.
subject to the following constraints:
a. The % frequency within each bin > 5%
b. The WOE transformation is Monotonic

6. Fit a Binary Logistic Regression Model to the WOE-Transformed Variables.

If you have any advice or suggestions w.r.t. the above please let me know.

Regards
Discussion stats
• 0 replies
• 207 views
• 0 likes
• 1 in conversation