- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Posted 06-12-2016 09:22 AM
(3008 views)
I have a SAS Programming Problem that you may have already solved:
My Data Set contains three sets of continuous variables:
DQ01 - DQ59 DE01 - DE59 & DL01 - DL59.
( 177 variables ) Each standardised with Mean = 50 and Variance = 100
The basic Statistical problem is Binary Logistic Regression.
1. I want to bin each continuous variable using deciles or semi-deciles
that have been computed using PROC Univariate / Summary.
2. Compute and output the Percentiles for each Variable.
3. For each variable compare the observed values with the Percentile
Cut-Points and then allocate that observation to a Decile Bin.
4. Optimise the Bin Allocation based on a metric such as the GINI.
5. Apply a Robust WOE Transformation to each Binned Variable.
subject to the following constraints:
a. The % frequency within each bin > 5%
b. The WOE transformation is Monotonic
6. Fit a Binary Logistic Regression Model to the WOE-Transformed Variables.
If you have any advice or suggestions w.r.t. the above please let me know.
Regards
2 REPLIES 2
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I think this is a little too big for a forum post. Also you posted it twice in 2 different forums.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
My main problem was how to process a large number of variables using the same binning algorithm.
I have constructed a solution for the binning process for a single variable using proc rank.
Now I need a maco possibly using arrays that enables me to repeat the process and combine the output into a table.
Has this reduced the problem sufficiently?