Depending on the scenario, you might want to identify a cutoff probability of interest knowing that this will (likely) cutoff differently sized proportions of each data set it scores. If your goal is to predict the top 20% of the scored observations as responders regardless of their actual predicted probabilities, the easiest way is to use the RANK procedure with the GROUPS= option to create 5 bins based on the predicted probability. The first or last group (depending on sort order) corresponds to the top 20%. If you scored a data set with ROLE=SCORE using a Score node in SAS Enterprise Miner, you could connect a subsequent SAS Code node and use the following code assuming a categorical target:
/*** BEGIN SAS CODE ***/
libname mylib 'C:\data'; * define path to where you will write out your data;
proc rank data=&EM_IMPORT_SCORE out=myranks groups=5 descending; * identify 5 groups based on predicted probability; var EM_EVENTPROBABILITY; ranks MyRankVar; run;
proc freq data=myranks; * crosstabs of rank variable by actual target level; tables MyRankVar * %EM_TARGET / nocol nopercent; run;
data mylib.MyScores; * flag those in the top 20%; set myranks; Top20=.; if MyRankVar gt 0 then Top20=0; else Top20=1; run;
proc freq data=mylib.MyScores; * verify you have flagged the right group; tables MyRankVar*Top20 / norow nocol nopercent; run;
proc means data=mylib.MyScores; * calculate statistics on predicted probability grouped by Top20; var EM_EVENTPROBABILITY; class Top20; run;
/*** END SAS CODE ***/
It is possible you will not need the DESCENDING option in the RANK procedure. Also, the EM_EVENTPROBABILITY variable is added by the Score node so you will need to modify the code to identify the variable containing the prediction probability for the target event if you do not score using the Score node.
Hope this helps!
Doug
... View more