BookmarkSubscribeRSS Feed
KonstantinV1
Calcite | Level 5

 

I have a categorical variable Q1 taking the values from 1 to 4 and a binary variable Bad taking the values 1 and 0.

 

I want to calculate WoE and IV for the variable according to the existing categories in ascending order.

 

I know that HPBIN has option to calculate those statistics but it would also create other bins - i want to use the variable binned as it is. I have also tried to use the output of the Freq procedure, but it becomes inconvenient for the cases in which there is no events in some of the categories.

1 REPLY 1
Ksharp
Super User

Do you want combine these into other group ? or just use its original levels ?

 



%let var=marital   ;


title "变量: &var";
proc sql;
create table woe_&var as
 select &var as group,
sum(good_bad='bad') as n_bad label='bad的个数',sum(good_bad='good') as n_good label='good的个数',
sum(good_bad='bad')/(select sum(good_bad='bad') from have ) as bad_dist  format=percent7.2 label='bad的占比',
sum(good_bad='good')/(select sum(good_bad='good') from have ) as good_dist  format=percent7.2 label='good的占比',
log(calculated Bad_Dist/calculated Good_Dist) as woe
from have
   group by &var
    order by woe;


select *,sum(  (Bad_Dist-Good_Dist)*woe  ) as iv
 from woe_&var ;

quit;

 

https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2018/1808-2018.pdf

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 2459 views
  • 0 likes
  • 2 in conversation