BookmarkSubscribeRSS Feed
eemrun
Obsidian | Level 7

i am trying to do some univariate analysis for a large dataset and trying to include 'Information Value' in there. I am trying to run the following code as part of a macro but it is not calculating the IV (saying that: the number of bins may be less than the number of levels specified). 

 

	* calculate IV for the variables;
		proc hpbin data=&input. numbin=5 ; 
	     input &numNames.;
	     ods output Mapping=Mapping;
	 	run;
	 
		ods output infoValue = IV;
	 	proc hpbin data=&input. WOE BINS_META=Mapping ;
	     target &target_var./level=nominal order=desc;
	 	run;

Any ideas how I can do this manually or is there a better procedure that automatically calculates this without much trouble. I am doing this in conjunction with Proc Univariate within the macro.

3 REPLIES 3
Rick_SAS
SAS Super FREQ

Can you reproduce this issue by using a standard SAS data set in the SasHelp library? Then everyone could reproduce the issue.  For example, can you modify the following program to indicate the issue that you are seeing?

 

* calculate IV for the variables;
proc contents data=sashelp.heart short; run;

proc hpbin data=sashelp.heart numbin=5 ; 
  input weight;
  ods output Mapping=Mapping;
 run;

ods output infoValue = IV;
 proc hpbin data=sashelp.heart WOE BINS_META=Mapping ;
  target BP_Status / level=nominal order=desc;
 run;

If you can't reproduce the issue in a standard data set, please provide the SAS log for your problem so we can see the ERROR or WARNING messages.

eemrun
Obsidian | Level 7

Hi Rick,

 

I am guessing it must be something to do with my dataset. I did not get any errors in the code that you put. The following is the error message that I get when I run the code in my dataset. I am posting a snippet here. Let me know if you want the full log.

 

MPRINT(UNIVARIATE):   ods output Mapping=Mapping;
MPRINT(UNIVARIATE):   run;
NOTE: Binning methods: BUCKET BINNING .
NOTE: The number of bins is: 5.
NOTE: The HPBIN procedure is executing in single-machine mode.
WARNING: The binning level may less than the number of levels specified.
NOTE: The data set WORK.MAPPING has 891 observations and 8 variables.
NOTE: Compressing data set WORK.MAPPING decreased size by 0.00 percent. 
      Compressed is 2 pages; un-compressed would require 2 pages.
NOTE: There were 243878 observations read from the data set PL.TEST.
NOTE: PROCEDURE HPBIN used (Total process time):
      real time           9.14 seconds
      cpu time            9.48 seconds
      

MPRINT(UNIVARIATE):   ods output infoValue = IV;
MPRINT(UNIVARIATE):   proc hpbin data=pl.test WOE BINS_META=Mapping ;
MPRINT(UNIVARIATE):   target Bad/level=nominal order=desc;
MPRINT(UNIVARIATE):   run;

NOTE: Binning methods: BUCKET BINNING .
NOTE: The BINS_META= data set is being used. Binning method is ignored. NUMBIN= is ignored. VAR and INPUT statements are ignored.
NOTE: The HPBIN procedure is executing in single-machine mode.
NOTE: The data set WORK.IV has 215 observations and 2 variables.
NOTE: Compressing data set WORK.IV increased size by 100.00 percent. 
      Compressed is 2 pages; un-compressed would require 1 pages.
NOTE: There were 243878 observations read from the data set PL.TEST.
NOTE: There were 891 observations read from the data set WORK.MAPPING.
NOTE: PROCEDURE HPBIN used (Total process time):
      real time           9.76 seconds
      cpu time            11.62 seconds
      
Rick_SAS
SAS Super FREQ

I don't know, but the doc for the NUMBINS= option says "The resulting number of binning levels might be less than the specified integer if the sample size is small or if the data are not normalized. In this case, PROC HPBIN provides a warning message."

 

Try plotting a histogram of the data you are binning. Attach it along with the Mapping produced by PROC HPBIN.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 3442 views
  • 0 likes
  • 2 in conversation